Highly Available Websockets on Google Cloud

A few weeks ago, @devongovett, who is still working @Storify, contacted me to share an update on the way Storify load-balances websockets. We wrote an article 2 years ago to explain how we used IP affinity in order to load-balance socket.io using HAProxy.

But balancing using Source IP affinity could lead to issues: socket.io’s polling transport breaks on some corporate networks where they load-balance traffic across multiple external IP addresses. In this case, the requests associated with a particular session id might connect to a different process, which would break the way socket.io buffers the messages for this particular session.

The answer is to switch to an application layer persistence solution with a session cookie.

Let me take you through an example.

Chat example

I’m going to use a slightly modified version of the socket.io chat example, by splitting it into 2 servers.

One server will be responsible for serving the single HTML page. The other server will take care of broadcasting the messages to all chat users.

In a production context, the HTML page should be delivered by a CDN. But for the demo, I want to show how we could load-balance both HTTP and Websocket requests.

In order to have the messages passing between websocket servers, we will use a Redis adapter. When User A, connected to the instance-websocket-1 server sends a message, this message can be broadcasted to users connected to the instance-websocket-2.

Load balancing the traffic

Full Network Diagram

Google Cloud Platform offers a TCP Load Balancer that will be used as the public entry point of our Network. It will load-balance the traffic between many HAProxy instances.

HAProxy will be responsible for redirecting traffic to the desired backend server (frontend or websocket), and to make sure the socket.io requests from the same user always go to the same process. This is critical if the client doesn’t support the WebSocket protocol and therefore falls back to polling transport.

HAProxy configuration

First, HAProxy will listen to all incoming traffic on port 80, and redirect it to the websocket backend based on the subdomain (“ws.”), or to the HTTP backend otherwise.

frontend public
bind *:80
maxconn 10000
acl is_websocket hdr_end(host) -i ws.node-example.com
use_backend ws if is_websocket
default_backend www

The HTTP backend configuration is pretty straightforward. It uses a “roundrobin” strategy to load-balance the traffic. Let’s not forget to define the HTTP check url so HAProxy knows when one of the servers is failing. I find it useful to add a query string for debugging purpose.

backend www
timeout check 5000
option httpchk GET /status?haproxy=1
balance roundrobin
server www1 maxconn 1000 weight 10 check inter 10000 rise 1 fall 3
server www2 maxconn 1000 weight 10 check inter 10000 rise 1 fall 3

The magic lives in the websocket backend configuration. If the WebSocket protocol is supported, there is no issue because a single tcp connection is used. But in the case of falling back to polling transport, many connections are made to the backend. On the first request, HAProxy sets a cookie specifying which server was used, then it uses that cookie to choose the same server for subsequent requests.

backend ws
timeout check 5000
option httpchk GET /status?haproxy=1
balance roundrobin
cookie HAPROXY_WS_COOKIE insert indirect nocache
server ws1 maxconn 1000 weight 10 check inter 10000 rise 1 fall 3 check cookie ws1
server ws2 maxconn 1000 weight 10 check inter 10000 rise 1 fall 3 check cookie ws2

Google Cloud Load-Balancer configuration

The TCP Load-Balancer is responsible for balancing the incoming public traffic on port 80 to the HAProxy instances.

Google Cloud Load-Balancer configuration


I added some information to the chat page to make it easy to check the full system:

  • “Html server” shows which HTTP backend served the page.
  • “Websocket server” shows which websocket backend is listening to the user’s messages.
  • When the page connects/disconnects from the websocket backend, a colored message appears.

Sticky session

By refreshing the page, we can see that the html is served by either backend servers, but the websocket is always connected to the same (original) server.

Frontend load-balancing vs websocket cookie session

Dead websocket backend

In case of a failing websocket backend, the user will re-connect to another instance. In this example, our user was connected to instance-websocket-2, this instance died, and the user was reconnected to instance-websocket-1.

Reconnection to a healthy websocket backend

Dead HAProxy instance

When a HAProxy instance dies, Google Cloud Load-Balancer redirects all the traffic to the other HAProxy instances.

Dead HAProxy instance


This whole stack gives us a robust way to scale our service, which uses socket.io for communication.

HAProxy gives the ability to fine-tune the redirection of the traffic to the desired microservice, with the necessary option to attach a session cookie for socket.io.