Remove the spinners on your page with WebSocket, Scala Play & Akka and Redis

Do you hate to see spinners after clicking some button on the page? Well, at least you know your command has been accepted. But what if the spinner is still there after 1 second, 2 seconds, 3 seconds…? Maybe on the 5th second the whole page reloads and shows the latest information but before that you can’t do anything else. Or more enjoyable, the spinner is there forever… until you click your browser’s reload button, or ⌘R, or ^R of course...

In this post, I will go through a solution of how to build a Notification Service(NS, from now on) to help with this situation. NS is built upon WebSocket protocol, leveraging Scala Play framework and Akka. Redis (both Key/Value store and Pub/Sub) helps scale out the service.

The problem, when is it done?

Say you have a page that lists a table of records and each row has a button that you can click on to trigger some action on the record.

The action is time consuming so you decide not to block the user from doing anything else on the page. Here comes the spinner.

You hope the user can still update other records and the spinners will magically disappear once the action is done and updated information for the corresponding rows will display.

The problem really is how the page knows when the action is done so it can fetch the latest information for certain records. Of course the backend service knows when it finishes processing the action request. It needs to tell the page, “Hey, I’m done with updating record with id: ZAyUW”…

The solution, WebSocket and Pub/Sub

The WebSocket protocol enables interaction between a browser and a web server with lower overheads, facilitating real-time data transfer from and to the server. In this way, a two-way (bi-directional) ongoing conversation can take place between a browser and the server.

With WebSocket, the conversations between browsers and service now look like this.

Browser A: “Hey NS, tell me once record I2FE9w has been updated”
Browser B: “Hey NS, I also want to know once record I2FE9w is updated”
Browser C: “Hey NS, let me know when record 6Y8uI0 is updated”

NS talking to Browser A and B: “Record I2FE9w has been updated”
NS talking to Browser C: “Record 6Y8uI0 has been updated”

This is a Publish/Subscribe pattern, it ensures NS sends messages to all interested subscribers (browsers) upon event occurrences. In this example, the most basic implementation is a Map. The key of the map is event, e.g. “user 23, record I2FE9w updated” and the value is a list of interested browsers. What can you use to represent browsers? Since you can create long-lived connections between browsers and NS via WebSocket, one connection means a browser to NS. Hence the value of the map can now be a list of WebSocket connection (object) references. So once record I2FE9w has been updated, NS will look up the map and find the subscribers’ WebSocket connections and send through the event.

This works well if you have a single instance of NS running. Imagine people are really happy about this new feature, thanks to word of mouth, you are getting more and more users onboard. A single instance of NS is now overloaded. Don’t panic, let’s scale it out by adding more instances. You immediately realise the map implementation won’t work anymore because what if the pub/sub registry for browser A is stored on instance-1 but relevant events are sent to instance-2? Let’s use a centralised Key-Value store, like Redis. But wait, WebSocket connection references are used before because they are meaningful to a single NS instance. However after being moved to Redis, is the case still valid? It doesn’t seem to be, unless NS instance address is added alongside connection references. Before if the connection reference looks like conn-1 now it should be conn-1@instance-3, for example. And somehow the event should be sent to the instance where the connection reference is, this case, instance-3. The complexity increases dramatically, compared to the previous implementation.

Let’s take a step back, if you are to ship the service via AWS ECS (EC2 Container Service) because one of the advantages is Application Load Balancer supports WebSocket natively, is there any other AWS services can be leveraged to solve this problem? SNS+SQS looks close to what you want. Given each server instance still has its own pub/sub registry (map implementation), you create one SQS for each (instance/container), subscribing to SNS topic that is for all events. In this way, all instances will receive all events, each looks up registry and will only push the event to the browser if the key can be found in the map. You know it’s not a typical use of fanout pattern, it’s more like broadcasting, not mentioning the complexity of creating/deleting SQS upon ECS auto-scaling.

So isn’t there any simpler solution? The answer is, still, Redis. Don’t forget Redis has Pub/Sub implementation. How would it help? Browser establishes a WebSocket connection with NS and if you can manage to establish a connection between NS and Redis, subscribing to a Redis channel - the event. The key is one WebSocket connection to one Redis connection mapping. In this way, events are published through the Redis channel and all the way to the browser via WebSocket connection. How to define the channel is really up to your domain model. In this example, your event payload probably looks like below. And you will use user id as Redis channel.

The details, Scala Play & Akka

Sounds like a working solution, how would you design NS? What APIs will it expose? “We need some security for the service.” — You are thinking of some kind of login process. NS will probably delegate authentication to the backend service as NS itself doesn’t belong to any particular domain.

NS will issue its own token if authentication is successful. With this DIY handshake, browser now can use NS’s token to establish WebSocket connection.

It’s clearer at this point that you will expose at least 3 APIs.
- POST /authenticate — called by browser to authenticate user, return ns_token
- GET /connect/:ns_token — called by browser to establish WebSocket connection
- POST /publish — called by backend to publish events

The rest will be conversations over WebSocket connection.
- Browser sends message to NS to register subscription
- NS sends event to subscribing browsers

Sequence diagram for NS

You are passionate about Functional Programming these days so you will give Scala a go to write this service. Play framework sounds good to work with WebSocket in Scala world. After reading the documentation, you understand you need to provide an Akka Actor Props for Play framework.

Any messages received from the client will be sent to the actor, and any messages sent to the actor supplied by Play will be sent to the client.

Here client is yet another Actor Play uses to handle WebSocket, to the other end of the connection. One client Actor per WebSocket connection. The actor supplied by Play mentioned above really means the actor of which you provide the Props. Once again, one Actor per WebSocket connection. To implement authentication, you also want to have a way to reject a WebSocket.

How about Scala Redis client? You find a library written by Debasish Ghosh. You’ve read his book called Functional and Reactive Domain Modeling and quite enjoyed it. So you will no doubt give that lib a try. Turns out it’s not bad, it uses Akka Actors to handle Redis Pub/Sub. It will work with Play WebSocket seamlessly. In the Actor you provide for Play, when receiving the subscription message from the client Actor, you will send a message to Redis subscribe Actor to subscribe to Redis channel. Once WebSocket connection is closed, send a message to the same Actor to unsubscribe Redis channel.

RedisPubActor.scala
RedisSubActor.scala
WebSocketActor.scala
JsonUtils.scala
Models.scala
websocket_client.js

The whole solution can be deployed onto AWS. Use ALB and ECS to ship the service with ElastiCache (Redis pub/sub).

Gotchas

ulimit

Since WebSocket connections are long-lived, you might run into a situation that no more connections can be established due to ulimit settings. You can either increase the settings via ECS task definition or scale out ECS service on CloudWatch metrics especially on the number of connections.

Redis maximum number of clients

Since the service is designed to have one Redis connection per WebSocket connection, the maximum number of clients needs to be set properly. A corresponding ElastiCache parameter can be found.

Closing notes

I hope you enjoyed reading, it is truely an interesting service to build. I haven’t covered every single detail of course. Hopefully the sequence diagram gives you a full picture of how the service works. Any questions please comment below.