MCU vs SFU : Confused what to choose as a Video Collaboration Tool?
Web conferencing has helped to revolutionise modern workplaces and have tremendously affected day-to-day communication of organisations in the current covid era. Video conferencing removed the most challenging barrier of geographical locations for people to meeting for collaboration. This not only helped organisations to save on cost of organising such meet-ups but also helped them scale to much large audience for such collaborative events.
In its initial days, WebRTC was designed to be a peer-to-peer communication technology. WebRTC technology has been converting the internet browser into a powerful multimedia engine. WebRTC being a P2P design by default, doesn’t scale in its native form. To scale WebRTC applications, you have to leverage topologies designed to extend its capabilities. Today we will discuss about the pros and cons of these topologies.
Peer to Peer (P2P/Mesh)
P2P/mesh is the simplest topology from architecture point of view and most cost-effective solution you can use in a WebRTC application as in this case all clients(or peers) talks to each other directly and hence it’s also the least scalable.
Mesh applications can be resource consuming, because the burden of encoding and decoding streams is offloaded to each client. P2P topologies are meant only for few number of clients. On the plus side, mesh provides the best end-to-end encryption because it doesn’t depend on a centralised entity to encode/decode streams or relaying the same. Small organisations can achieve the objective without investing a lot.
- Easy to set up
- Better security
- Cost-effective because it doesn’t require a media server
- Only meant for a small number of participants
- CPU intensive because the processing of streams is offloaded to each client.
Selective Forwarding Unit (SFU)
SFU is most widely deployed topology in recent times. In simple terms, an SFU is a relay-routing system designed to offload some of the stream processing from the client to the server. A SFU is more upload efficient than a mesh topology as now endpoint has to decode only fixed number of individual remote streams from SFU not all of the remote streams as in case of mesh.
A SFU topology is also balanced topology as in this case, both client and server has some portion of stream processing to do. In contrast, in mesh topology, the whole burden is on client and in MCU topology the whole burden of stream encoding/decoding is on server.
In SFU architecture, each participant still sends just one set of media, similar to MCU. But, the SFU does not make any composite streams. Instead, it sends a different stream to each user as per the resolution requirement and bandwidth fulfilment of the receiving client. As depicted in the picture, since there are 4 people in the call, 3 streams are received by each participant.
The main purpose of server(SFU), in this case it to find out what are the stream requirements of a particular endpoint and which remote streams can fulfil it and then based on these two inputs, SFU selects the best streams and relays the same to the endpoint.
There is less work on each participant as compared to the original peer-to-peer model. This is because each peer is only establishing only one connection (to the SFU) instead of connecting with each participant.
The main advantage of this topology is that it does not need to decode and encode received streams. It acts as a relay server of streams between the peers. This saves a lot of time.
This topology is mostly suitable where the remote streams are limited to some ~10 streams to be received by a single endpoint. The max output bandwidth can be calculated by ~max-streams * (number of endpoints in conference -1).
- Requires less upload bandwidth than a P2P mesh
- Streams are separate, so each can be rendered individually — allowing full control of the layout of streams on the client side
- Limited scalability due to bandwidth constraints.
- Dependency on server as compared to mesh.
- Higher operational costs as some CPU load is shifted to the server.
- Extra layer of stream control is required.
Multipoint Conferencing Unit (MCU)
MCU topology is the most widely used in large size conference deployment and is the oldest stable topology existing. The main reason of its success is its ability to deliver stable, low-bandwidth audio video streaming. In MCU topology, the encoding/decoding of the stream is only limited to the server and hence the clients are offloaded from stream processing work. The layout creation is the responsibility of the server itself and client only has to render the received layout.
All users establish a connection with the MCU server to share the media. The MCU then makes a composite media stream(layout) containing all of the video/audio from each user and sends that back to all members. MCU server decodes, rescales, and mixes all incoming streams into a single new stream and then encodes and sends it to all clients. Endpoints need not do much work as MCU contains most of the logic and very little intelligence is required at the device endpoint. The layout is created per endpoint based on various factors like — bandwidth capability of the endpoint, its resolution requirement, active speaker information etc.
Transcoding multiple audio and video streams into a single stream and then encoding it at multiple resolutions in real time is very CPU intensive, and the more clients connect to the server the higher its CPU requirements.
MCU topology is best suited for large conference size.
- Large conference size is possible.
- Composite output simplifies integration with external services
- Simple endpoints which are not CPU rich can be easily connected.
- CPU intensive; the more streams the bigger your server
- Single point-of-failure risk because of centralised processing
- High operational costs due to computational load on server
Which is the right architecture for you:
An SFU is best used in multiparty conferencing applications that don’t have too many concurrent participants. Or in other words, SFU is suited when an endpoint is not expecting too many remote streams to be consumed.
Whereas a MCU is best suited for large conference applications, or in other words where an endpoint is expecting video from lot of remote participants.