We were wrong about HTTP & WebSockets — here’s what we learned

Sharon Grossman
Feb 18 · 5 min read

At Soluto, we’ve created AnywhereExpert — which at its core, is a high-scale rich messaging platform, serving around 80 million customers worldwide. As the number of users grew larger, we noticed an increase in the number of messages that are simply not reaching their destination. To tackle this issue — we embarked on a mission.

What’s in it for me?

  • Are you planning to build a reliable & scalable messaging system?

Let’s Go! 👻

The Goal — Reliability

The only symptom of a failed message delivery was the number of times the Resend Button has popped — the Yellow (!) Indicator.

An indication to the customer, that a message has failed to deliver

It became the goal & symbol of our task — to reduce the number of Yellow (!) indicators to 0, to WebSockets compared to an HTTP based services?

We wanted to know when a message fails to deliver, and why.

WebSockets for the win, or are they? 🙈

Send & Receive data using a single socket

The application emits a Socket event to the server, and the server in return emits a Socket event to our Socket room. Classic.

Back to our case, we could not find the root cause of errors both in our service, that emits the WebSocket events, and the custom client that our Frontend applications use. Did we have subpar tooling for WebSockets compared to an HTTP based services? Perhaps.

But

WebSockets are a side effect.

Wow! That’s a bold statement, isn’t it? What happens when you have multiple clients? external APIs, tons of flows & events, bots? Do they send messages via WebSockets?

This debate ultimately led us to use an event-driven pattern we’ve implemented in other critical services — tackle the side effects of creating a message, and ensure our platform is flexible and scalable enough to add components and flows.

The Pattern — HTTP, WebSockets & friends 👪

  • CRUD Operations are done via simple REST API, using HTTP.
The event-driven pattern for real-time flows

What?! Use HTTP for real-time operations?!

HTTP vs WebSockets — I wish it was a decisive battle, but this argument made it difficult for us to convince other developers of the validity of our solution. WebSockets are arguably better in real-time performance when it comes to web applications, so we’ve had our concerns about using HTTP in some parts of our architecture. Perhaps the better solution was to improve our tooling and code quality around WebSockets, and not turn away from it.

So why? Because honestly, in real life it looks like this

  • One way or another, we wanted to separate our HTTP and WebSockets traffic, in order to scale the services correctly.

How about an example? ✍🏻

  • Frontend Application uses a generated HTTP client to create a message; meanwhile it handles optimistic rendering and further UI changes
await Api.create(message);
  • api receives the data, saves it in a database and produces to Kafka
public async create(
id: string,
payload: Payload
): Promise<Data> {
const data = await createAndSave(id, payload);
await produceToKafka({data, eventType: EventType.Created});
return data;
}
  • live-api consumes an event, and emits to a socket room
switch (topic) {
case 'event':
const {eventType, id, data} = payload as EventPayload;
emitToRoom(eventType, id, data);
break;
}
  • live-api receives an event on a socket room, and emits a response with data to that room
socket.to(id).emit(eventType, data);
  • Frontend app gets incoming data & handles UI changes and — Voila!

The Results 🙌🏻

We were surprised to see the number of undelivered messages (Yellow (!) Indicators) drop by a huge margin, almost to zero. The experience itself remained smooth, and the customers themselves reported a much stabler and even faster platform.

What about the failed messages now?

Tracking and fixing errors was a much lesser task, as the code was simpler, easier to monitor and log, and less coupled.

Number of messages that failed to deliver (Yellow ! Indicator)

What did we learn from this journey? 🤔

  • Keep your APIs lean. Leverage event-driven patterns to create reactive and decoupled pieces of code that make sense.

As a side note, our tech stack in this case is: Node.js, React + Mobx, Socket.IO +Redis, Kafka, MongoDB, on top of Kubernetes clusters.

Soluto by asurion

Engineering. Product. UX. Culture.