In this article, we will use the ”ws” module to illustrate some points but the principles remain the same regardless of the library used. I wrote this article to create a kind of checklist about what to keep in mind before release a project using Websocket and Node.js.
In this article:
- Broken connections
- Refresh connections
- Automatic reconnection
Why? Because production stability matters
Most of the time you will find a lot of great articles to learn how to implement and use Websockets. It shows you examples of very basic applications using real-time communication.
But what happens in real life? When you release an API on your production you must take into account more than just if your code works. You monitor your production, you use CI, a logging system (I really hope you do!), etc.
This is the same thing for Websockets. You should be careful about some important points.
JWT based authentication system
JWT is a powerful tool to manage security on API but it works for Websocket based system too. It’s as simple as :
- Client-side: add a token in your headers
- Server-side: before each connection, check the token inside headers.
Prefer WSS over WS
Like HTTP, WS protocol has its secure version, called WSS. On your server, you should configure SSL and a different port as you do for HTTPS. On the client-side just use wss:// instead of ws://.
const ws = new WebSocket('wss://myendpoint.com/')
Based on the CPU? Memory?
Most of the time, a WS connection will stay idle after the connection is established. Except when you use it for business logic, and for the keep-alive protocol (ping/pong requests), a Websocket doesn’t use a lot of resources.
Setting up an autoscaling system based on memory and/or CPU is not always the best idea and this is why automatically scaling a WebSocket application is not really easy.
Scaling based on open connections
Even the number of requests per server is not a good indicator because your connections are stateful, and every user will not reconnect very often. The best way to do it is to scale on open connections per server. You can have access to this value with CloudWatch if you use AWS for example.
Another thing to keep in mind is the tuning of your instances. The best way to handle a lot of persistent connections is to increase some values of your operating system and/or your application. For Node.js under a Linux based OS, you can refer to this great article: https://blog.jayway.com/2015/04/13/600k-concurrent-websocket-connections-on-aws-using-node-js/
Message broker and Pub/Sub mechanism
When working with Websockets, you build an event-based system. The best solution to scale your backend with such a system is to use a message broker. It will allow you to work with a powerful messaging pattern called Pub/Sub. A lot of technologies support this kind of pattern like Redis, RabbitMQ or Kafka. The good news is most of them are managed by Cloud providers and can scale automatically.
A common issue when you work with WebSocket is broken connections. It appears when one of the endpoints (client or server) does not respond, or when it’s not reachable anymore. To manage this we need a logic on both server-side and client-side to gracefully close the connection. The idea is very simple: create a kind of heartbeat function to check periodically if a connection is still alive. Otherwise, close the connection.
Some libraries, like WS, do not always provide a mechanism to automatically reconnect to the server. You will probably need this feature if your backend unexpectedly restarts (which can happen after each deployment ;)). Here we just need to automatically reconnect the client after it catches a close event.
If your connections have a long life, which is probably the case if you use Websocket, then you should refresh them (basically close them and open a new one). This is useful when you do some stuff when the connection is established and if you want the client to be up to date with the server. Or if you want to use another token if it expires soon.
There is no advice here on how to do it because it highly depends on your business logic. Anyway, a good recommendation is to refresh the connection every hour then you can change this interval if necessary.
Most APM solutions don’t support WebSocket monitoring, they are mainly focused on classical request/response through HTTP protocol. They are well designed for API and web servers.
Anyway, even if your APM doesn’t support Websocket instrumentation, you can use custom transactions and/or custom attributes to do it.
A basic example with elastic APM:
Now you can see your events in Kibana:
On the left side, there is the transaction type list in which we can retrieve our 2 events “Message” and “Connection” (see code snippet). And at the bottom, we can see all the messages grouped by name. If you click on it you will have access to the transaction details.
As you can notice in the transaction detail we can see the “PUBLISH” span which corresponds to a call to Redis (used in my app) because this module is natively instrumented by the elastic agent.
One last thing, if you want feedback about a large application using Webscokets in production, I highly recommend watching this amazing talk by Susheel Aroskar.