SocketCluster Design Patterns for Chat

Published in

Tech Renaissance

7 min readFeb 16, 2017

Note that this article was written based on an earlier version of SocketCluster. Although the coding patterns are still relevant, the code shown in the snippets may be different depending on your SC version.

SocketCluster was designed to make the process of building scalable real-time systems simpler — If you don’t adhere to the right design patterns, however, some of these scalability benefits can be lost completely. To help you make the most of SC, I've compiled a list of patterns and anti-patterns— This blog post was written with chat systems in mind but it’s also relevant to other kinds of real-time systems built with SC.

Subscribe to a channel, then request a snapshot

This is a common pattern and is particularly useful in SC. Often, you want to show a list of the previous n messages on your front end and make the list update in real-time as new messages arrive. In SC, you can do this on the client side by first subscribing to a channel (e.g. socket.subscribe('room-a')) and then requesting a fresh snapshot of your room whenever the ‘subscribe’ event triggers. So for example:

var roomAChannel = socket.subscribe('room-a');// Update the UI with new messages.
roomAChannel.watch(addNewMessage);roomAChannel.on('subscribe', function () { 
  // Fetch a full snapshot once we are hooked up to the
  // real-time stream.
  socket.emit('getRoomAMessages', {count: 10}, renderMessageLog);
});

The reason why this pattern works particularly well in SC is that it accounts for network failures (connections/reconnection). If your client socket becomes disconnected, all subscribed channels will be temporarily moved to a ‘pending’ state — Then, later, when the socket auto-reconnects, the ‘subscribe’ event will trigger again and the fresh snapshot will be fetched — This means that any messages which may have been missed while the client was offline will show up on the front end.

Note that if you have written a custom MIDDLEWARE_SUBSCRIBE function on the back end which requires authentication via JWT (See tutorial), then you should also specify a waitForAuth: true option on the front end when subscribing to the affected channel(s).

So, instead of:

var roomAChannel = socket.subscribe('room-a')

You would use:

var roomAChannel = socket.subscribe('room-a', {waitForAuth: true})

The waitForAuth option will prevent the channel from trying to subscribe itself until the socket has been authenticated. Once the socket becomes authenticated, all client-side channels marked with waitForAuth will initiate their subscriptions with the server.

You can do a lot with Middleware

The idea of middleware in SC was borrowed from the Express framework — Unlike in Express however, in SC, there are multiple different kinds of ‘middleware lines’ — Each one allows you to control a separate aspect of the eventing and pub/sub flow. There is a separate middleware line for emit, subscribe, publish in (inbound), publish out (outbound), handshake and authenticate actions.

While it’s well known that middleware functions can be used to block requests/actions (e.g. for authorization); in SC, they can also be used to transform messages on the fly. For example, you could implement a MIDDLEWARE_PUBLISH_IN function which finds swear words in a message and converts them to asterisks (****) on the fly — Or, alternatively, you could specify a similar function as MIDDLEWARE_PUBLISH_OUT so that it only affects outbound messages to subscribers whose age is <14 (for example). Middleware makes all kinds of advanced data/message transformations like this really simple.

Here is a simple example which will replace the word ‘hello’ with ‘hi’ in all messages published by clients (where req.data is a string):

scServer.addMiddleware(scServer.MIDDLEWARE_PUBLISH_IN,
  function (req, next) {
    if (typeof req.data == 'string') {
      // You can modify req.data to whatever you want 
      // and this is what will end up being published after
      // passing through this middleware function.
      req.data = req.data.replace(/hello/g, 'hi');
    }
    next();
  }
);

Channels in SC are free; use as many as you like

I often get asked questions like “What is the maximum number of channels that SC can handle?” or “How do I ‘create’ a channel in SC ?”. A lot of developers who are new to SC seem to think that channels are a special, super expensive resource that needs to be managed (created and destroyed) carefully. While this may be true for some back-end message queuing systems (like RabbitMQ, Kafka, NSQ, …), it is not the case for SC. In fact, channels in SC are practically free; when idle, they consume no CPU and very little memory (typically less than a couple of hundred bytes each) — This means that you can potentially have hundreds of thousands of unique active channels at any given time for each worker process. It’s perfectly fine to have unique channels for each client/user who is connected to your system — In fact, each client can be subscribed for up to 1000 unique channels by default.

To answer the question “How do I create a channel in SC?” — you don’t; SC manages the life-cycle of channels for you; if at least one client tries to subscribe to a unique channel name (which doesn't already exist on the back end), then SC will automatically create the relevant channel and will automatically destroy it when all clients have unsubscribed from it (or disconnected). You can use middleware on the back end to control who is allowed to subscribe to what channel and you can read JWTs (Json Web Tokens) attached to each socket to make fine-grained decisions.

Each user can have their own private channel(s) named after them

This follows on from the previous point; because channels are practically free, it’s OK to have a lot of them. A useful pattern is to let each user have at least one private channel named after them — That way any other user (or a server) who wants to send them a message/data can simply publish directly to that user’s private channel. So for example, if you have a user with the username ‘alice123’, you could just make the relevant client subscribe to the channel ‘private/user/alice123’ (or similar) and allow other users to publish to it. This approach has several benefits:

It is highly scalable — SC channels automatically scale across multiple worker processes and can easily be configured to scale across multiple hosts too (see SCC). So even if your system ends up with 1000 SC hosts and you have 2 users connected to 2 different hosts which are on the opposite side of the globe, the 2 users can still easily communicate with each other by simply publishing stuff to each other’s private channels — You don’t need to know which process or host a user is connected to; SC will automatically (and efficiently) route it to the correct one.
Private user channels can be used as a way to allow users to acknowledge the receipt of a specific message published by another user to another channel. For example, you could just add the sender’s username and a message ID as part of the message JSON, then when another user receives that message, they could publish an ack message (containing the original message ID for example) back to the sender’s private channel. Each user could have a dedicated ack channel in the format ‘private/user/ack/alice123’ or similar — You can come up with your own conventions.
Unlike with socket.emit, with pub/sub you don’t need to do complex server-side lookups to figure out which sockets belong to which user or group— The client-side subscription itself tells SC where the message needs to end up and SC takes care of the rest.

You can use middleware functions (and JWTs attached to sockets on the back end) to make sure that users are only allowed to subscribe to their own private channel (other users can publish but not read).

It’s OK to give JWTs a short expiry date and renew them often

When dealing with JWTs, it’s often tempting to have long expiry dates of 1 month or more. While that’s OK for a lot of medium-security systems, it’s not ideal for high-security systems like online banking or similar— In these scenarios, one approach that you can follow is to hand out tokens with very short expiries of like 10 minutes or so — If you don’t want to force users to have to login again every 10 minutes, you can auto-renew (re-issue) the token every 6 minutes or so while the user is still online (just call socket.setAuthToken(tokenData) on an interval from the back end); so basically, the real 10-minute expiry countdown will only begin once the user goes offline (since the token will stop being re-issued). This approach also means that you can easily block/ban any user within 10 minutes without having to store/manage any additional authorization flag in your DB or Redis — This 10-minute delay may be fine for a lot of systems (it depends on how urgently the ban needs to take effect).

Avoid keeping global state in your worker process(es)

Sometimes, you might be tempted to have variables in your worker.js file to hold global data. A common example of this is when you try to have an array or hash map (object) which contains the state of ALL connected users in the system (across all worker processes). While this is fine if you just have 1 worker process running on 1 server, this approach doesn’t scale beyond that — As soon as you add a second worker process to your SC instance, you will find that each worker only ends up holding a subset (in this case, half) of the complete user list (some users will be connected to worker 1 and others will be connected to worker 2). Trying to synchronize this list across workers so that each worker can have access to the entire list of users (in its own memory) undermines scalability and is a very bad idea (you just end up duplicating the same workload across multiple workers).

The lesson here is that, in any distributed system like SC, you need to get used to not having access to the full set of data in one place at the same time. Global state which needs to be accessible from any/all worker(s) should be stored in an external key-value store or database and retrieved using queries/commands on an as-needed basis.