Scalability in video-conferencing (Part 1)

Published in

Linagora Engineering

6 min readOct 26, 2017

OpenPaaS is an open collaboration platform for business. Linagora developed Hubl.in which is a free and Open-Source video-conferencing tool built into OpenPaaS. This virtual meeting is based on the WebRTC standard that provides browsers tools for real-time communications, and Linagora would like to improve the scalability of Hubl.in.

This is a two parts article about scalability in video-conferencing. The purpose of this first article is to introduce the general context of our project and some video-conferencing notions like the several topologies to develop a video-conferencing tool, the difference between them and which one is the best to develop your own tool. Then, in the next article (part 2), we will discuss about different existing topologies while specifying some advantages and drawbacks of each. After that, we will talk about the difference between some existing tools of each topology, and this will allow you to understand which one has been selected to improve our amazing Hubl.in.

What is a video-conferencing?

Video-conferencing allows two or more people to have a real time communication including (or not) sharing data.The real time communication can be audio, video, chat, sharing file and also screen sharing. It is mainly done over IP networks or phone networks.

Is it easy to implement?

It is simple to do a basic video-conferencing software. You only need to rely on audio/video device interfaces (exposed as endpoints) in order to directly connect two given clients over a network (it is exactly what Skype and Facetime did at the beginning).

Hubl.in is based on the same basics, it is called WebRTC standards.

WebRTC is an open source project that provides browsers and mobile applications with Real-Time Communications (RTC) and IoT (Internet of Things) through simple APIs.

Scalability

Scalability in video-conferencing means supporting a growing number of people that can communicate simultaneously.

There are three different ways to set up a video-conferencing:

Peer-To-Peer (Mesh)
Selective Forwarding Unit (SFU)
Multipoint Control Unit (MCU).

Peer-To-Peer (Mesh)

Mesh is certainly the lowest cost solution for video-conferencing. Each client sends all its streams to all other clients. All video-conferencing (i.e. Skype, Facetime …) or Open-Source WebRTC software (i.e. Hubl.in …) began using Peer-to-Peer because it is:

A low-cost solution without using any intermediate infrastructure (server, cloud, etc.)
Easy to use and implement
Data privacy is respected without going through a server

OK! P2P is apparently low-cost and simple to use, but what happens if the number of participants increases?

When the number of the participants increases:

Participants overwhelm the bandwidth by sending audio and video streams to all participants.
They overwhelm processing capacity (i.e.CPU) at the endpoint client by coding and decoding simultaneously all streams.

As a result, the quality of the video-conference is degraded (frozen video, sound cut-offs, etc.)

Is there any other way to resolve these issues?

Fortunately, there are two other ways to resolve P2P scalability issues:

SFU and MCU, let’s talk first about SFU (Selective Forwarding Unit)

Selective Forwarding Unit (SFU)

Selective Forwarding Unit, as its name refers to, allows users to send audio and/or video streams toward the SFU. In this case, clients can select which streams to send. After that, they receive one or many streams from the other participants. Every client can choose to receive one high bit-rate (highest quality) stream, and one or more low bit-rate(s). Then, SFU responds to clients demands.

In the picture above:

All users (1,2 and 3) send their streams to SFU
User 3 is the host of the conference. In this case, he is the speaker.
User 1 and user 2 instruct SFU to receive a high bit-rate (large red arrow) of the speaker (user 3)
User 3 demands to receive a high bit-rate (large blue arrow) of the user 1
SFU responds to user 1 and user 2 by sending a high bit-rate of user 3
SFU sends all the other strams on their lowest bit-rate

So clients don’t have any control on the SFU server?

Clients with full correspondence with the SFU media server have a total control over the streams they receive. Because of that, they can have full control over the flexibility of the SFU.

Why use SFU instead of P2P?

SFU is based on a centralized topology which allows to receive all participants streams, then according to them, it select the stream to forward to each one.
The additional latency of the SFU server is minimal.

This is why SFU is the most popular topology of the WebRTC communities.

What is the main SFU’s drawback?

The main drawback of SFU is the fact that the steps of encoding and decoding the streams are done on the browsers, which slows down the processing after a high number of participants.

Multipoint Control Unit (MCU)

Multipoint Control Unit also called “video-conferencing gateway” or” bridge” is a centralized video-conference infrastructure.

MCU avoids that each user (client) sends all his audio and video streams, When it receives all the streams, MCU encodes, decodes then combines all the streams into one stream, then sends the combined stream to all participants.

Sending only one stream to the users means to noticeably reduce latency and communication between the MCU servers and users.

Is mixing all participant’s streams into one stream the only advantage over SFU?

In addition with mixing all streams into one stream before sending it all the participants, MCU can also:

Supports differents clients and call signaling like SIP or H.323.
Equalizes and mix all audio streams by filtering noise, reducing echo etc.

Moreover, centralizing the encoding and decoding of the streams decreases bandwidth consumption as well as computation load on the client endpoint

So, why MCU is not as popular as SFU?

Without doubt, scalabality is the main advantage of the MCU topology. But the implementation is more complex, and takes a lot of computing resources on the server.

SFU could be the best balance between the P2P that overwhelms the client, and the MCU where all processing are centralized.

Which topology to use for my video-conference application?

It depends on your use case :

If you need to develop an application for less than 3 participants, you can go with a Peer-to-Peer topology, it’s simple to implement and certainly the lowest topology ever.
If you need an application which support between 5 and 10 participants, and you have a good bandwidth? Do it with SFU specially if you are fan of WebRTC: it’s a low cost centralized topology.
If you want to invest in a media server to allow more than 10 participants, you don’t have a good bandwidth? Go with MCU server. Don’t forget to improve your skills in signal, image and video processing, you will need to use some tools like Gstreamer to process and mix your media streams.

Some articles talk about an Hybrid Topology: which allows all particpants using SIP devices to use MCU for the encoding, decoding and stream combination. The other particpants using PCs or laptops send all their streams to SFU. Be sure to configure the best bandwidth to the SFU part, because of high number of the exchange streams between the centralized server and the clients.

Stay tuned …

In the next part, we will discuss about the different existing SFU’s and MCU’s tools.