Scalability in video-conferencing (Part 2)
Previously in the first part of scalability in video-conferencing, I introduced the three most popular video-conferencing topologies. In this second part, we will discuss different tools based on these topologies, then I will explain which one has been chosen to scale Linagora’s open-source video-conferencing tool Hubl.in.
Before starting, lets remind some basics and important points that I presented previously in the first part article:
- Hubl.in is a free and Open-Source video-conferencing tool developed by LINAGORA based on the WebRTC standard that provides browsers tools for real-time communications.
- Three video-conferencing topologies: Peer-To-Peer (Mesh), Selective Forwarding Unit (SFU), and Multipoint Control Unit (MCU).
- LINAGORA wants to make Hubl.in that is based on a P2P topology scalable and able to support more than 5 participants per conference room.
Now that I’ve put you in the context, let’s start to discuss some of the most popular video-conferencing servers tools. Note that WebRTC is the first condition that these tools must be based on.
Let’s begin with the most popular tool within the WebRTC community
Janus
The Janus WebRTC Gateway provides implementing the means to set up a WebRTC media communication with a browser, exchanging JSON messages with it, and relaying RTP/RTCP and messages between browsers and the server-side application logic they’re attached to. Any specific feature/application needs to be implemented in server side plugins, that browsers can then contact via the gateway to use the functionality they provide.
Architecture
The figure below shows how the Janus architecture is organised:
- Plugin: a list of plugins that are provided can be changed or extended to match our requirements, or just used as a simple reference should we be interested in writing a new plugin from scratch. Example of such plugins can be implementations of applications like echo tests, conference bridges, media recorders, etc.
- Protocols: all the protocols that are implemented in the gateway are listed in this part.
- Core: contains the core of the gateway.
- Janus Interface (HTTP): the available HTTP interface that Janus gateway exposes to help its use from an external application.
- Janus Interface & SDP helper (JS): Session Documentation Protocole SDP is a format for describing streaming media communications parameters. Each SDP coming from the web application (peers) is anonymized and removed from all the transport information and leaving only relevant information before sending its to the plugins. It do the same thing within adding transport information to plugins SDP answers and anonymizing / removing all other information.
- Web Application: the application we want to match with the Janus gateway.
You can check all the available demo of Janus on: https://janus.conf.meetecho.com/demos.html
What are the most pros and cons of Janus?
Without any doubt, one of the most pros of JANUS gateway is the plugin philosophy, but not only:
- Easy to use.
- The developers community are fast answering.
- A complete and multi-purpose WebRTC gateway (audio, video, screen sharing, etc).
- Scalability topology choice: SFU & MCU (for audio streams).
On another side, we notice some cons:
- Each functionality is a plugin.
- No available plugin for MCU video streams.
- Some parts of the documentation are not updated.
Jitsi
Jitsi is a free software distributed under Apache 2 license which aims to provide a video-conferencing solution accessible from a web browser. It is the software solution that propels the Framatalk online service. Based on XMPP protocol, Jitsi is also compatible with WebRTC. If some video-conference tools are based on MCU for mixing the streams, through Jitsi videobridge, Jitsi uses SFU to avoid mixing all streams in one as it’s done in MCU.
Architecture
The next figure shows how the interaction of the different parts of Jitsi is establish:
- Jitsi Meet: a WebRTC video-conference frontend.
- HttpWeb Servers (Nginx, Apache, Jetty).
- XMPP: an open-source alternative to commercial messaging and chat providers.
- Prosody: an open-source and modern XMPP communication server. It aims to be easy to set up and configure.
- Jicofo: Jitsi Conference Focus is the server that manage the connection between the participants and the videobridges.
- Videobridge: an SFU server that manage all conference media streams.
As we saw for Janus, Jitsi has also pros and cons.
Pros
- Based on XMPP protocol for real-time multimedia exchange.
- Scalability using SFU
- Licensed under MIT License
- A lot of applications based on it are complete (use Screen Sharing, chat …)
- Jitsi has it own video-conference software tool. If you didn’t develope it, you can use all the jitsi package (video-conference too and the gateway).
Cons
- Installation is difficult.
- Unstructured documentation.
- Scalability using only SFU.
Licode
An Open Source WebRTC Communications Platform. Although it was originally MCU, it can now also behave as an SFU. Licode itself is implemented in C ++. With Licode, you can host your own WebRTC conference provider and build applications on top of it with easy to use APIs:
- client-side : this API handles connections to rooms and streams in your web applications.
- server-side : The Licode server-side API provides your server communication with Nuve.
Architecture
- Nuve : Manages Services (Custom APPs), Rooms and Users, generates tokens for delegated authentication so custom apps can provide access to users. It balances the Rooms among the available ErizoControllers.
- MongoDB Only used by Nuve to store information about rooms and tokens. No user information is managed by Licode.
- Erizo Controller Manages Control, signalling and data streams for the rooms assigned to it by Nuve. New started ErizoControllers are automatically discovered by Nuve as long as they are connected to the same RabbitMQ instance.
- MCU — Erizo Agent+ErizoJS (ErizoAPI+ErizoC++): Distributed MCU : A single Licode Room can be now easily distributed among an array of servers
- RabbitMQ: Message broker, enables the distribution of the architecture. Handles all the messages among the components of Licode. It does not handle media and/or communicate with the Clients.
Pros
- A mature project
- It guarantees high scalability (thanks to its distributed architecture)
- It provides both client and server APIs , it is easy to integrate it .
- As it is tested at https://www.knuddels.de/, it can support a good number of rooms and each room can support a nice number of participants.
Cons
- Mainly based and focus on MCU.
- Screen sharing is not stable (on Firefox).
Kurento
Kurento is a WebRTC media server and a set of client APIs making simple the development of advanced video applications for WWW and smartphone platforms. Kurento features include group communications, transcoding, recording, mixing, broadcasting and routing of audiovisual flows.
It also provides advanced media processing capabilities involving computer vision, video indexing, augmented reality and speech analysis. Kurento modular architecture makes simple the integration of third party media processing algorithms (i.e. speech recognition, sentiment analysis, face recognition, etc.), which can be transparently used by application developers as the rest of Kurento built-in features.
Architecture
As the below figure shows, Kurento contains 3 sides:
The kurento Server is the sever that manage media streams within media transporting, encoding/decoding, transcoding, mixing, media processing, etc. As you can see, the Media Pipline inside Kurento Server represents a GStreamer pipeline which is an open-source framework that allows you to manage and manipulate multimedia streams. Once the streams are processed, they are sent to the client side application through HTTP, RTP and WebRTC protocols.
On the other side, we have the application server. This server manage the signalling plane. It contains the business logic and connectors of the particular multimedia application being deployed. The application can use mature technologies such as HTTP and SIP Servlets. This part of the architecture is in full contact with the application, this is why developers have to design the application as simple and flexibile as possible.
Pros
- Modular architecture allows for multiple modules : anyone can create modules.
- GStreamer dependency provides advanced media processing capabilities involving computer vision and augmented reality.
Cons
- Interest for the project has highly decreased since Twilio acquired the team developing Kurento
- Main developers and community are less active.
SO, it’s difficult to chose one of all these tools isn’t it???
Yes, It is. To choose one of these tools, you have to list all your criteria and needs. For LINAGORA, the most important criteria is to not use a media server this year at all, which means to not use MCU infrastructure for the moment. Take a look on this table:
As you can see, for an open source company like LINAGORA, license, number of stars, number of commits on Github and community activity are very important to chose which tool to use, which means that the community is always ready to answer you whenever you have any issues. According to these criteria, we can say that Janus is the most popular and used open source project.
For Hubl.in developpers, the WebRTC, SFU and MCU compatibily, frameworks, supported APIs are the most important criteria. But not only. The table bellow shows some of them.
After testing all the tools, we choose Janus as an SFU video-conference bridge:
- It is the most stable SFU server.
- Easy to connect with Hubl.in
- Plugins needed for Hubl.in are already implemented.
- The project community is very reactive.
- AP’s needed for Hubl.in are supported.
In the case of Hubl.in improvements, LINAGORA wanted to increase the number of video-conference participants. The third year of the research project OpenPaaS::NG has for goal to allow more than 5 participants in the same video-conference room. In this part of the project, speech recognition has also figured on Hubl.in. Indeed, we developed a bot called Hublot, which connect to Hubl.in as a participant and offer some recommendations according to the topics for the connected participants of the video-conference. Janus offers to LINAGORA’s Hubl.in product to allow the connexion of more than 5 participants in the same video-conference room, exactly 6 participants adding Hublot. Developers team decided to switch from easyRTC (P2P) to Janus SFU mode as needed. The idea is to allow users to chose which architecture to use. For example, if you are only 2 or 3 participants to connect to the same video-conference room, you can use easyRTC. Otherwise, you can use Janus gateway as shown bellow:
RTC Adapter is a module developed by Linagora team in order to allow user to chose which technology to use.
As LINAGORA wanted to add sharing screen to Hubl.in, Janus offers this functionality too.
Note that the development of this part is in progress and not yet finalized.
What can we say about all these technologies?
All the technologies and gateways mentioned in this two parts articles are interesting, each one is designed for specific needs. Before choosing one, it is necessary to properly determine the needs, whether we want to set up a multimedia server (MCU) or not (SFU). It is also necessary to clearly understand the context of the application we want to develop in order to choose the right tool. For example, if the application needs any framework, it is necessary to check if the it is supported on the gateway or not. for Hubl.in, we needed some tool that supports more than 6 participants in the same video-conference room, screen sharing functionality and also a tool that is compatible with speech recognition fields. That could make the integration easier.
Coming next…
We will talk about Hublot the video-conference bot assistant that gave birth to LinTO.
Interesting, isn’t it!? “Join us”
Don’t hesitate to ask any question about all the topics cited in this two parts article.