How to Build Messaging capability for live video streams at scale!

7 min readNov 13, 2021

Problem Statement:

Live video streams are very effective medium for sharing events and communicate. It can be the video conferencing over zoom like systems , it can be the Facebook live videos or even viewing the live telecast of some sporting events. But to make these streams more engaging and interactive the requirement arises to send the messages over those live streams like comments/like on Facebook live, sharing some messages over the video call chat with the group, commenting on how the person is feeling while watching the sporting events etc.

This mechanism of sharing thoughts over stream come with multiple challenges, one of it is building a system that could handle variable load at high scale. One realtime example of this situation is the live sporting event like INDIA/NEW-ZEALAND WTC final. That match had a viewership of about 25 million which is highest till date for any live event. In general the load would not be that high but it can be there for such events so the system should be built to handle such unexpected situations.

In this article we will discuss on how we can build comment and like functionality for such high scale live stream systems. We will not discuss on how to optimise the live streaming of videos at scale.

Some of the common requirements :

All user should be able to comment as and when they want.
The comments should be delivered to all the audience of that stream in real time.
System should be decoupled from the live video stream flow as this is not on the critical path.
Messages delivered are real time so if the user joins after time T1 it will only see the comments after that timestamp only.
Messages should be delivered over different device types.
Messages needs to be delivered to all viewers irrespective of their regions.
System should be highly available and scalable.

Lets start..

At High level the system should look like below:

Here as seen in the above diagram Alice and Bob both are interacting over the same live stream. Here Alice wants to comment so he send the comment to the messaging service and messaging service in-turn will deliver the comment to all the users who are currently viewing that live stream in this case Bob.

Assumptions

Here we are focussing more on the messaging service so we will assume that security concerns like auth, ssl, rate limiting etc. are already in place
User information management is already in place.
Also we are not focussing on the UI which will be used for viewing the messages.
One user can be part of one stream only at each time.

Now lets deep dive in the design and understand different components of the system:

Components description

Messaging Service

This is the API servers that will receive the comment/like request from a user for a particular stream.
Once the request is received from a user messaging service will do some basic validations and then will push the request to the nearest dispatcher which would be in the same region in which this component is running.

Dispatcher:

Dispatcher will read the messages from the queue and will take following actions:

Read the stream name from the request and find out all the supervisors which have active connections for that stream.
Push the message to all the supervisors which have been calculated in step-1.
As supervisors of the stream can be there in different regions as well so dispatcher will also fanout messages it has received to the queues of dispatchers in all regions.
Here as the load on the SQS increases number of dispatchers can also be scaled on the basis on number of unread messages in SQS. Thus achieving independent scaling.

Supervisor

Supervisor will read the messages from SQS and will take following action:

Find all the users that are listening to a live stream for which the message is intended.
Push the message to all the users calculated in step1.
Number of supervisors can be scaled if the number of concurrent connections increases.
When a user starts streaming the video a web socket connection will be made with the supervisor. Web socket connection is tied to the server so there is a limit on number of concurrent connections that can be handled by one server. So to scale this supervisors will be kept behind the load balancers.

Events flow

Alice and Bob both are watching stream1.
When user starts streaming a video, a long poll connection will also be opened with the supervisor.
Supervisor will store the data (stream-id and user-id association) in its cache.
Each user will also keep sending the heartbeat to the supervisor to let it know that connection is still alive. If the heartbeat is not received for a specified period of time then the connection will be considered as stale and it will be removed from the supervisor cache.
After the connection is established to the supervisor it will call the dispatcher to register the stream it has started listening to.
Dispatcher will store this data (stream-id and supervisor-id) in its cache.
Now Alice who was connected to stream1 will send a like or comment for that stream through API call on message service.
Message service will push this event to the nearest dispatcher.
Dispatcher will check in its cache to find out all the supervisor nodes who have users that are listening to stream1 and will push the message to relevant supervisor nodes.
Supervisor node after receiving the message will check in its cache to find all the users that are listening to stream1 and will push that message through the web-socket connection to all the listeners of stream1.

Logging and monitoring:

All the application and event logs will be pushed to the ELK stack.
Metrics can be pushed to one if the APM system like hyper-trace, data-dog etc.

This will enable the application team to setup dashboards and alerts thus enabling them to be aware of the system health.

Auto-Scaling:

In the above systems all the components are loosly coupled and can be easily scaled independently

If the number of API calls is increasing the messaging service instances can be scaled to handle the load.
If the number of messages in the dispatcher SQS increases number of dispatcher nodes can be scaled to handle more messages and thus maintain the quality of service.
If the number of concurrent user connections increase supervisor nodes can be scaled to handle more connections.

Note: Although autoscaling proves useful for handling traffic it has various limitations like insufficient capacity error, spinning up new servers take time or is usually slow as compared to increase in traffic, autoscaling group limitations like single instance per autoscale group etc. So in order to handle the heavy loads we should not rely only on autoscaling.

Efficient Management of long live connections

As the number of live connections can be very large might increase at brisk pace. So there has to be some strategy in place to maintain these connections effectively as the system to handle so many connections cannot scaled indefinitely.

Reasons for large number of connection:

Actual traffic is very high.
Old connections are not getting terminated properly as all clients not always shut down gracefully.
There are large number of idle connections in place that have not been used for long.

Strategies to manage large number of concurrent long live connections:

Use the async I/O for managing connections so number of connections that are managed by single server can be increased. This will optimise the resource utilisation of the server but increase the complexity of the code.
Assign the TTL to every connection. Assign random TTL to each connection so that it is killed after a specified time and if needed the client can establish the connection again. NOTE: While assigning TTL the time chosen should be random so that we are not stuck at thunder-herd problem.
Usually the server doesn’t close the connections from its end so periodically the server can send the kill signal to the client so that client can listen to it and close the current connection and establish a new connection only of it is needed.

Characteristics of service

Logging and Monitoring using Kibana.
As the streams are getting used between various components so the throttling of load can be done at various points.
As the service is deployed on cloud using EKS horizontal scaling can be done. Also as there are different loosely coupled components, each component can be scaled independent of any other component. Scaling criteria: Throughput, CPU usage, Memory usage.

Note: Autoscaling configuration is an important factor that is needed to build the scalable system but the choice of criteria is very important in most cases CPU, memory usage will not lead to the right criteria. It can be based on number of concurrent users, number of messages flowing etc.

Provide any feedbacks or clarifications or improvements in comments section.If you like to discuss on some design topic please add in comments section.

Happy learning…

References:

https://www.youtube.com/watch?v=yqc3PPmHvrA

How to Build Messaging capability for live video streams at scale!

Components description

Written by umang goel