Soluto by asurion
Published in

Soluto by asurion

We were wrong about HTTP & WebSockets — here’s what we learned

At Soluto, we’ve created AnywhereExpert — which at its core, is a high-scale rich messaging platform, serving around 80 million customers worldwide. As the number of users grew larger, we noticed an increase in the number of messages that are simply not reaching their destination. To tackle this issue — we embarked on a mission.

What’s in it for me?

  • Are you planning to build a reliable & scalable messaging system?
  • Curious to discover new event-driven implementations of a large-scale real-time platform?
  • Interested in the journey of a codebase serving millions of customers?

Let’s Go! 👻

The Goal — Reliability

The only symptom of a failed message delivery was the number of times the Resend Button has popped — the Yellow (!) Indicator.

An indication to the customer, that a message has failed to deliver

It became the goal & symbol of our task — to reduce the number of Yellow (!) indicators to 0, to WebSockets compared to an HTTP based services?

We wanted to know when a message fails to deliver, and why.

WebSockets for the win, or are they? 🙈

Send & Receive data using a single socket

The application emits a Socket event to the server, and the server in return emits a Socket event to our Socket room. Classic.

Back to our case, we could not find the root cause of errors both in our service, that emits the WebSocket events, and the custom client that our Frontend applications use. Did we have subpar tooling for WebSockets compared to an HTTP based services? Perhaps.


WebSockets are a side effect.

Wow! That’s a bold statement, isn’t it? What happens when you have multiple clients? external APIs, tons of flows & events, bots? Do they send messages via WebSockets?

This debate ultimately led us to use an event-driven pattern we’ve implemented in other critical services — tackle the side effects of creating a message, and ensure our platform is flexible and scalable enough to add components and flows.

The Pattern — HTTP, WebSockets & friends 👪

  • CRUD Operations are done via simple REST API, using HTTP.
  • Socket Server listens to events, and emits data to the connected sockets.
  • Use Kafka for robust communication between those services using events.
  • Scale the services correctly for traffic, and to reduce code mess and dependencies.
  • Trigger multiple side effects as much as you like, by listening to the message event anywhere in your flow.
  • Let’s name them for simplicity: api (HTTP) and live-api (WebSockets)
The event-driven pattern for real-time flows

What?! Use HTTP for real-time operations?!

HTTP vs WebSockets — I wish it was a decisive battle, but this argument made it difficult for us to convince other developers of the validity of our solution. WebSockets are arguably better in real-time performance when it comes to web applications, so we’ve had our concerns about using HTTP in some parts of our architecture. Perhaps the better solution was to improve our tooling and code quality around WebSockets, and not turn away from it.

So why? Because honestly, in real life it looks like this

  • One way or another, we wanted to separate our HTTP and WebSockets traffic, in order to scale the services correctly.
  • The effort of refactoring & improving our tooling around WebSockets was far greater than implementing the new architecture.
  • By using this pattern, we created an event-driven flow of data, that begins by utilizing HTTP’s reliability in CRUD operations, and WebSocket’s fast emission back to the client.

How about an example? ✍🏻

  • Frontend Application uses a generated HTTP client to create a message; meanwhile it handles optimistic rendering and further UI changes
await Api.create(message);
  • api receives the data, saves it in a database and produces to Kafka
public async create(
id: string,
payload: Payload
): Promise<Data> {
const data = await createAndSave(id, payload);
await produceToKafka({data, eventType: EventType.Created});
return data;
  • live-api consumes an event, and emits to a socket room
switch (topic) {
case 'event':
const {eventType, id, data} = payload as EventPayload;
emitToRoom(eventType, id, data);
  • live-api receives an event on a socket room, and emits a response with data to that room, data);
  • Frontend app gets incoming data & handles UI changes and — Voila!

The Results 🙌🏻

We were surprised to see the number of undelivered messages (Yellow (!) Indicators) drop by a huge margin, almost to zero. The experience itself remained smooth, and the customers themselves reported a much stabler and even faster platform.

What about the failed messages now?

Tracking and fixing errors was a much lesser task, as the code was simpler, easier to monitor and log, and less coupled.

Number of messages that failed to deliver (Yellow ! Indicator)

What did we learn from this journey? 🤔

  • Keep your APIs lean. Leverage event-driven patterns to create reactive and decoupled pieces of code that make sense.
  • Paradigms are bound to be broken. Even though WebSockets excel at real-time performance, we found out that by using it only where it’s needed, and replacing some parts of it in HTTP, actually helped our platform.
  • The Separation of WebSockets and HTTP into different services gave us the ability to scale correctly and find errors on each moving part of the app.
  • Trying to fix a broken thing could be a far more complex mission than creating and taking advantage of existing patterns that work.
  • Bets: in an agile environment, it’s important to take them on, and consider both success and failure.
  • Improving your weak spots, as an engineering team, and as a company, is a part of growth and success.

As a side note, our tech stack in this case is: Node.js, React + Mobx, Socket.IO +Redis, Kafka, MongoDB, on top of Kubernetes clusters.




Engineering. Product. UX. Culture.

Recommended from Medium

Angular 2 Forms

Best Practices for Angular 13

Story of one animation: WebGL and not WebGL

Happiness with Coffee☕️

Coffee logo in github page

What’s New in TypeScript 4.0

JavaScript Algorithm: Time Conversion

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sharon Grossman

Sharon Grossman

Senior Software Engineer @ Fabric

More from Medium

GraphQL: The Good, The Bad, and The Bottomline

Monitoring GraphQL like REST

Why we strayed from our middleware stack for a micro-services framework called Steve

Steve, the marmot, holding a jar of TypeScript components

What exactly does “Low coupling, High cohesion” mean?

Low coupling, high cohesion example