Scaling OpenVidu

Rodrigo Botti

Published in

Nexa Digital

6 min readJun 1, 2020

Using AWS & OpenVidu Pro — Part 1

Camera in the foreground (focused) filming a woman in the background (unfocused). Behind the woman there's a book shelf. — Video, Camera, Optics, Photography, Shooting, Film by StockSnap

Introduction

With the advent of the global pandemic and social isolation recommendations there was push for companies to develop digital solutions that enable people to execute their tasks remotely. More and more video conferencing and video chat platforms and services are becoming popular and a part of people's lives.

Here at Nexa Digital — owned by DASA, the biggest diagnostic medicine company in Latin America — we create digital solutions for the health sector. Most recently we were tasked with incorporating a telemedicine solution to our telehealth platform Livia Saude to enable medical professionals to conduct medical appointments remotely by video i.e. doctors video conferencing with their patients.

In order to do that we knew we would need to use a video conferencing/streaming technology. After research, without going into much detail, we decided to use OpenVidu.

"OpenVidu is a platform to facilitate the addition of video calls in your web or mobile application. It provides a complete stack of technologies very easy to integrate in your application. Our main goal is to allow developers to add real-time communications to their apps very fast and with low impact in their code." — OpenVidu official docs

Note: at the timing of writing this article, OpenVidu is currently in version 2.14.0. When we started developing our telemedicine solution it was in version 2.12.0 and we haven't updated it since.

For more details on how OpenVidu works and how it can be used you can check the official docs — or this pretty cool summary.

Architecture

First of all, OpenVidu offers a "premium" paid version called OpenVidu Pro. This is the version we have deployed and that our telemedicine solution is using. We chose to use the paid version for two main reasons:

it offers detailed monitoring of the video call sessions which is great for troubleshooting and analysis
it offers a means for scaling it's media nodes — the server nodes responsible for streaming the video content

OpenVidu Cluster Architecture from OpenVidu Pro Scalability page

Deployment

We deployed the OpenVidu Pro Cluster to our AWS account following their guidelines. It uses a parameterized CloudFormation Stack that takes in some configuration values — at Nexa we are very familiar with CloudFormation so this was a great fit to how we do things when it comes to infrastructure.

Auto Scaling Problem

If you read OpenVidu Pro's feature list you will notice that scalability is a manual effort and that elasticity is still a work in progress.

Manual Scaling in OpenVidu Inspector from OpenVidu Scalability page

"So what gives? Are you telling me I read up until this point just for you to tell me easily found information on how to press a button at a web control panel ?! Where's the automation?" — an understandably frustrated reader.

Calm down fellow reader. I am going to show you how you can automate this process. I was just giving you some context. Just keep on reading it, it will pay off.

One thing I omitted up until this point is that OpenVidu Pro has a REST API and it will play a major role in automating the auto scaling process: we are able to launch and drop media nodes using this API.

Solution

Context

Let’s just list some things we know that will help us decide on a final solution to our auto scaling mechanism.

OpenVidu Pro has a REST API that is able to launch and drop media nodes giving us the ability to scale nodes arbitrarily
All video sessions will always have only two connected users: the patient and the doctor
Video streaming is a CPU intensive process i.e. mainly CPU usage in the media nodes increases as the load — connected users which translates to sessions created — increases
OpenVidu Server — master node — balances the sessions uniformly across it’s media nodes distributing the load
Our telemedicine solution has fixed “office hours” from 8am to 10pm — this is a business rule

From that we have:

We need to scale the media nodes according to their CPU usage
CPU usage will be uniform across all media nodes i.e. the average CPU usage will be practically the same as the CPU usage on any given node

Finally we have what we need to formulate a solution.

Upscaling

CloudWatch Alarm which observes the CPU usage of the media nodes by their AMI

Since the usage is uniform across the nodes, by observing the average usage of the nodes we are indirectly observing the usage in each node
It is easier than monitoring each instance individually — would require an alarm per instance or polling telemetry data from the launched nodes which are dynamic

SNS Topic as the alarm event destination

By using a topic we can have multiple consumers

Lambda subscribing to the topic

Receives the alarm event and calls OpenVidu Pro's REST API to launch a new media node
Should only launch a new media node if there are no media nodes in a launching status — verified by listing media node data using the REST API

Note: to be able to use CloudWatch Alarms to monitor by AMI, the instances need to be launched with detailed CloudWatch monitoring enabled. In order to do that we need to modify the launch script that is used by the OpenVidu Server master node when launching media nodes. More on that in Part 2.

Downscaling

Given that "office hours" are fixed we can trigger a downscale to a single node after hours.
It's not the ideal solution when it comes to cost management — since it could lead to having idle and/or underutilized media nodes during the day which we would still be paying for pointlessly — but it is by far the easiest one to implement.

CloudWatch Event triggered periodically

Lambda triggered from the aforementioned event

Lists media nodes using the REST API
Orders them by number of sessions descending
Drops all except the first — most loaded — using the strategy when-no-sessions

The Code

In Part 2 we'll take a look at some of the source code of the autoscaling solution:

CloudFormation template: creates the CloudWatch Alarm — we will comment about the alarm configuration — , the SNS Topic and gives the topic permission to call the Lambda
Media Node launch script: modified to enable detailed monitoring
Lambdas written in Node.JS using the serverless framework: core of the source code + serverless.yml template that creates the lambdas subscriptions

Conclusion

OpenVidu is a video conferencing platform that is very easy to use and deploy. Unfortunately it doesn't come with elasticity out-of-the-box which can be a problem when you need to scale automatically to serve a varying number of users.
Fortunately it comes with the building blocks for creating your own auto scaling solution. We learned how to leverage that — combined with some of AWS's services and features — to create our own custom solution.

If you are using OpenVidu in production, I hope this helps shed some light into how to achieve the elasticity for your business needs.

Thank you for reading!
Hope to see you in Part 2 where we'll see some of the code that makes all of this possible.

Lastly, I would like to thank Letícia Tiveron, Joilson Cisne, Adrian Shiokawa and Thaissa Candella for the support and for proofreading and commenting on the article before its publication.