We were wrong about HTTP & WebSockets — here’s what we learned

Sharon Grossman
Feb 18 · 5 min read
Image for post
Image for post

At Soluto, we’ve created AnywhereExpert — which at its core, is a high-scale rich messaging platform, serving around 80 million customers worldwide. As the number of users grew larger, we noticed an increase in the number of messages that are simply not reaching their destination. To tackle this issue — we embarked on a mission.

What’s in it for me?

  • Are you planning to build a reliable & scalable messaging system?

Let’s Go! 👻

The Goal — Reliability

The only symptom of a failed message delivery was the number of times the Resend Button has popped — the Yellow (!) Indicator.

Image for post
Image for post
An indication to the customer, that a message has failed to deliver

It became the goal & symbol of our task — to reduce the number of Yellow (!) indicators to 0, to WebSockets compared to an HTTP based services?

We wanted to know when a message fails to deliver, and why.

WebSockets for the win, or are they? 🙈

Image for post
Image for post
Send & Receive data using a single socket

The application emits a Socket event to the server, and the server in return emits a Socket event to our Socket room. Classic.

Back to our case, we could not find the root cause of errors both in our service, that emits the WebSocket events, and the custom client that our Frontend applications use. Did we have subpar tooling for WebSockets compared to an HTTP based services? Perhaps.

But

WebSockets are a side effect.

Wow! That’s a bold statement, isn’t it? What happens when you have multiple clients? external APIs, tons of flows & events, bots? Do they send messages via WebSockets?

This debate ultimately led us to use an event-driven pattern we’ve implemented in other critical services — tackle the side effects of creating a message, and ensure our platform is flexible and scalable enough to add components and flows.

The Pattern — HTTP, WebSockets & friends 👪

  • CRUD Operations are done via simple REST API, using HTTP.
Image for post
Image for post
The event-driven pattern for real-time flows

What?! Use HTTP for real-time operations?!

HTTP vs WebSockets — I wish it was a decisive battle, but this argument made it difficult for us to convince other developers of the validity of our solution. WebSockets are arguably better in real-time performance when it comes to web applications, so we’ve had our concerns about using HTTP in some parts of our architecture. Perhaps the better solution was to improve our tooling and code quality around WebSockets, and not turn away from it.

So why? Because honestly, in real life it looks like this

Image for post
Image for post
  • One way or another, we wanted to separate our HTTP and WebSockets traffic, in order to scale the services correctly.

How about an example? ✍🏻

  • Frontend Application uses a generated HTTP client to create a message; meanwhile it handles optimistic rendering and further UI changes
await Api.create(message);
  • api receives the data, saves it in a database and produces to Kafka
public async create(
id: string,
payload: Payload
): Promise<Data> {
const data = await createAndSave(id, payload);
await produceToKafka({data, eventType: EventType.Created});
return data;
}
  • live-api consumes an event, and emits to a socket room
switch (topic) {
case 'event':
const {eventType, id, data} = payload as EventPayload;
emitToRoom(eventType, id, data);
break;
}
  • live-api receives an event on a socket room, and emits a response with data to that room
socket.to(id).emit(eventType, data);
  • Frontend app gets incoming data & handles UI changes and — Voila!

The Results 🙌🏻

We were surprised to see the number of undelivered messages (Yellow (!) Indicators) drop by a huge margin, almost to zero. The experience itself remained smooth, and the customers themselves reported a much stabler and even faster platform.

What about the failed messages now?

Tracking and fixing errors was a much lesser task, as the code was simpler, easier to monitor and log, and less coupled.

Image for post
Image for post
Number of messages that failed to deliver (Yellow ! Indicator)

What did we learn from this journey? 🤔

  • Keep your APIs lean. Leverage event-driven patterns to create reactive and decoupled pieces of code that make sense.

As a side note, our tech stack in this case is: Node.js, React + Mobx, Socket.IO +Redis, Kafka, MongoDB, on top of Kubernetes clusters.

Soluto by asurion

Engineering. Product. UX. Culture.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store