Powering real-time experiences for mobile and web with Socket.IO

Thắng Đỗ
Altitude
Published in
8 min readApr 9, 2021

Why?

Goal is to enhance the web and app experiences to be real-time. For example:

Example: Android App Pre-check-in and Task generated in Altitude Cloud

How do we add a new task in real-time? We often have 2 options: Push vs Pull (Polling). In the most cases, polling is not good, especially for a long time and with many clients. We need to pick a push solution.

Our requirements

  • Support more platforms: web, native android & iOS.
  • Each property is a space, client must be authenticated to access a space (using the Altitude JWT). Space can have many channels: notifications, tasks, alerts and more; and client can choose a channel to subscribe.
  • A central point to manage incoming clients. Other services want to send notification to clients then send via this central point with easy private API (e.g. via a http request).
  • Channel to subscribe is dynamic.
  • Client can subscribe more channels at same time or leave channel.
  • Server can actively disconnect from client for security reason
  • Scalability: support load balancing
  • Other service can broadcast to all clients in topic, data to broadcast is customizable. With this requirement, Pub/Sub paradigm is suitable
4 options to make client real-time.

Do we need another push?

What’s problem with FCM:

  • Throttling: Limit non-collapsible messages
  • Not really for realtime: you send 1000 messages to a single device (let’s also say that all is sent successfully), there’s a chance that GCM will throttle your messages so that only a few would actually push through OR each message will be delivered but not simultaneously to the device.
  • Hosted solution, third party and not free.

Other options, such as Pusher, are also third party, less customizable and not free and hosted solution too.

We need a self-hosted solution: more control, customization and future proofed.

High level design

We will define each property is a Space and a business logic component in property as Task, Maintenance… is a Channel

Client want to listen changing in Task, Maintenance then it must access to Space with authorization. After access success, client can start to join to channels and listen notifications.

Based on these requirements and y experience, socket.io was the best fit: Space becomes Namespace and Channel becomes Room.

What is socket.io (a quick overview)

Socket.IO is a library that enables real-time, bidirectional communication between the client and the server.

Communication is event-based

Client side event:

Server side event:

Design core components

Authorization uses the Altitude JWT

We don’t want to create new authentication cause user must login twice, one for Altitude and one for Direct Socket service. The best way is to reuse the Altitude JWT and verify it.

Socket.io doesn’t have an official authentication. A suggestion from them is using middleware (same concept as Express middleware) to verify token but this function will be executed only once per connection meanwhile our JWT is short time token and must be renewed multiple times in a connection.

So we create a custom authentication based on a socket event.

  • Client connect to a space (property) of server
  • Client connected success
  • After client connected, server will set a timeout and wait client send event authenticate
  • If client send event authenticate with JWT before timeout then server will verify it with space that client want to connect
  • If an invalid JWT or a space isn’t in the client JWT or a timeout occurs then the server will disconnect client with reason unauthorized
  • Otherwise, the server sends authorized event to the client and then start exchanging messages
  • Every time server sends message to client, server will verify client JWT hasn’t expired yet. If it’s expired then send disconnect with reason unauthorized
  • Every time client receive new token, it must emit event authenticate with new JWT to server double check.

Client want to join or leave channel

When connected to a space, client can join to any channel in space to listen notification in channel. If channel isn’t exist then server will create new one and add client in. Client need to ask server before join to channel by event join_room and wait until receive joined_room

We can add an authentication step when client ask to join channel but I feel it is unnecessary and complicated. When client is authenticated to access in space, it could listen all notifications in channel but we should not contain too much sensitive data in notification.

Client can ask server to leave channel by send event leave_room and wait until receive leaved_room

Join more channels in a space

Client can be comfortable to join more channel in space by send more events join_room to server. For example listen task notification and maintenance notification at the same time.

Note that it is advisable to leave the channel as soon as possible to avoid listening too much notifications. For example when you are opening Task tab and switch to other tab then should leave channel task.

Other services want to broadcast to clients

For example in above video, how to web can receive task notification when a guest implement pre checkin?

It’s our flow:

Important step is what’s data Task service should send to direct socket service to broadcast to all clients?

This is request

When receive this request, direct socket service will find space 5ca199fdda8ba619695b5728 in server then broadcast notification with type and data of request to all clients in channel task

Send 1–1 notification

How if we want send only private notification to a user. For example, admin send a message to staff to advise a delivery. We don’t want other staff can join this channel and listen, we want to create a separate channel for each user and no one else has access.

As noted above, we have only one step authentication at space so we can’t create a new channel for private notification. We need to create a private space and change way to verify JWT.

Each JWT always has a field sub that is subject of JWT. we can rely on it to create private space with format:

{serverUrl}/notification-subject/{subjectId}

So if client want to connect to this space, its files need sub in JWT match with subjectId in space.

Scale design: support load balancing

First step we can think to scale this service is increase number of servers and using load balancer.

However we can fall into the following situation:

  • We have 2 instance, instance A keep space Park Hotel and Beta Hotel, instance B keep space Grand Hotel.
  • Task service want to send notification to channel Task of space Grand Hotel
  • Request run via LB and redirect to instance A that least busy.
  • Instance A don’t keep space Grand Hotel so it cannot send notification to Web 3

From this situation, instance A needs to inform the other servers that a notification on space Grand Hotel is need to send. Instance B listen and check it’s holding space Grand Hotel so it will start to handle.

To communicate between socket server, we will use Redis that support pub/sub paradigm.

Here is design in detail for this flow:

  1. Server 1 start a timeout and put http request to its queue then it publish an event FA1 via Redis to found server holding space property A
  2. Server 2 and 3 subscribe event FA1 and check if it holding space property A
  3. Server 2 holding space property A so it get data from FA1 and broadcast to clients that join this space
  4. After broadcast success, Server 2 publish an event FA2 to Redis to report all success.
  5. Server 1 and 3 subscribe event FA2 via check http request in FA2 is in its queue
  6. Server 1 know that it should handle this request so it read data from FA2 and return in http response
  7. If timeout reached and Server 1 doesn’t receive FA2 yet to dequeue then it return error to other service.

Security

Encrypt data

Socket.io uses WebSocket as a transport when possible. It initializes the connection with a http request.

Request then check if client support then it will establish a WebSocket. Otherwise it continue with HTTP polling.

Like HTTPS, WebSocket has a secure protocol wss:// that is encrypted, thus protecting against man-in-the-middle attacks.

Fortunaty, when we initialize a connection it is https, Socket.io will then establish a WebSocket using wss without any additional line of code.

What’s happen if the JWT expires and cannot refresh

After authenticated, client JWT will be kept in socket connection. When server want to broadcast to clients in channel then it will check JWTs not expired. If it’s expired and not renew yet, client will be disconnected with reason unauthorized — the client will then begin the process of reauthirising with the authorisation server and reconnect to Direct socket service.

An another option is create a queue job to check JWTs and disconnect a client asap if it doesn’t renew token yet.

Ongoing

Enhanced load balancing: at the present time, more properties space run in one instance and when the number of properties increases, we add more instances. However, we can’t run one property space on more instances. This could be the problem if property has lot of socket connections or servers disconnect and loose messages. We need to improve this!

What is Altitude?

Altitude is a all in one smart hotel service, enabling hotels to connect with guests like never before and empower staff to provide more personalised services. Find out more at www.altitudehq.com

--

--