How we made Membrane SFU less ICE-y
--
This article was previously published on the Software Mansion blog.
A well-known problem in software development you probably noticed in your career is when existing technology with solutions it provides is not exactly matching the problem you need to solve. In most cases, there are a lot of workarounds to be found, additional settings, and proxy options, but in the end, the architecture or the code doesn’t look too good. Sometimes it’s wise to just stand aside and look if we really need all that stuff.
WebRTC has been made to let people communicate peer-to-peer using browsers (it’s mostly used to send media, but it’s not the only purpose). It’s working well because of ICE which lets browsers connect even if they’re hidden behind NAT. But what works well in direct communication between browsers, also creates unwanted overhead if you are building an SFU. You need to establish ICE communication between your server and each of the browsers wanting to join. In our case (Membrane is written in Elixir, therefore uses Erlang Virtual Machine to run), we decided to use libnice for that. It’s pretty popular, complete, and easier to integrate as it’s written in C.
It’s not bad in general but it is definitely expensive. Running a libnice process for each peer connection is a bit overkill, and it won’t scale up nicely. We’ve started building Membrane with scalability in mind so we needed to do something with it.
Good things are not cheap
Going back to the WebRTC basics there are two common problems for peer connection that ICE is helping you to solve: browser unawareness of its public identity because of NAT, and inaccessibility for UDP connections which are required, but not allowed by some firewalls. They’re commonly solved by STUN and TURN servers depending on the particular situation.
The STUN server allows you to obtain your public IP address which can be then used by the other side to establish a connection. Request to the STUN server creates a binding in the NAT table from our private IP address to a public one. Next, STUN receives our request and replies informing us about the public IP address it received our request from.
If it’s not enough to establish a connection (e.g. because of a restrictive firewall, or symmetric NAT) there is a TURN server that can help you. TURN shares its public IP with you by creating an association between your address and one of its ports, letting you present yourself like that to the other side. Then it tunnels all the media you’re transmitting. Moreover, because the other peer is making a connection with TURN, a transmission between TURN and you doesn’t have to be UDP. Generally, ICE requires UDP (there are some exceptions), but the ICE connection is now between the other peer and TURN.
Both STUN and TURN are commonly used but while STUN is quite cheap and easy (all it does is it provides a simple answer to the “Who am I?” question), TURN needs to be a powerful machine as its primary role is to translate all the media and address them properly, sometimes handling a lot of bindings at the time.
Do we really need it?
Considering SFU architecture, all WebRTC naming conventions might be a little misleading. We’re still having peers but one of them is an SFU server. So we’ve got a bunch of browsers connecting to SFU. Inside, each of those connections is operating through a libnice process. If we assume that some of the browsers might need to connect through the TURN server, we’ll end up creating two servers (SFU and TURN) needing to send media between each other.
Of course, external TURN is a must-have in peer-to-peer connections but in SFU architecture its work is limited to receiving TURN messages from the browser, translating them to UDP datagram, and sending it to the SFU server. And backwards of course. That proxy is surely expensive. It requires a machine to run on and doubles the number of expensive media transferring connections. But it doesn’t have to be like that.
First, let’s give all browsers only one option: connection via TURN. It might sound ridiculous, but there is a method to this madness. Then, let’s look up two pairs we’ve got: The TURN — browser connection is crucial as it requires a dedicated format, but the SFU — TURN connection is something we can easily simplify. Imagine TURN and SFU servers running on the same machine. We’ll save a lot of bandwidth and have faster connections. And what if they both run inside an EVM and communicate via Erlang messages? That will speed it up even more, and lets us control all the processes in one place. Moreover, there are existing TURN implementations in Erlang, so it’s not too much work. To be precise we decided on a quasi TURN server because browser—TURN communication is twofold. Connectivity checks are passed and handled by libnice, while media packets are not captured and internally sent directly into the SFU process.
Of course, we still can’t live without relying on libnice but we’ve limited its responsibility, and in near future, we’re planning to make its usage fully optional. It’s a lot of work to make a full Elixir ICE implementation, a lot more than to implement a TURN server, but, as I mentioned above, for SFU we don’t need full implementation. Having all communication going through TURN we need to implement only a small part of negotiation and connectivity checks. That will help us to keep the whole connection in control of an SFU application.