Manage Dynamic Multi-Peer Connections in WebRTC
If you are reading this, you are probably thinking about writing a video conference / chat application. Welcome to the club! I’ve been there myself, but when I started checking examples I realized most of them only supported a limited number of static peers, in most cases only two. In this post, I’ll walk you through how to manage dynamic multi-peer connections in WebRTC.
WebRTC (Web Real-Time Communication) is a technology that enables Web applications and sites to capture and optionally stream audio and/or video media, as well as to exchange arbitrary data between browsers without requiring an intermediary. The set of standards that comprise WebRTC makes it possible to share data and perform teleconferencing peer-to-peer, without requiring that the user install plug-ins or any other third-party software.
Let’s start by focusing for on the following passage:
exchange arbitrary data between browsers without requiring an intermediary
This is very important and most of the tutorials out there don’t talk much about this topic. This sentence is not necessarily true, just for the fact that if you want to build a video conference application with multiple users you need to map your IP address using an intermediary server so that other browsers can find you behind a NAT/firewall. Usually, a STUN Server is used for this cases. STUN Servers allow us to know each other’s addresses, but if it feels that you are missing something it’s because you are. After the peers know their addresses they have to communicate directly, and this is done using TURN (Traversal Using Relay around NAT) protocol.
Don’t worry, there are plenty of free STUN Servers. You can find a cool list here. Now, we know that for browsers in different networks to communicate with each other an intermediary is required: our STUN Server.
WebRTC — Peer connection architecture
WebRTC has a set of specifications that allow us to start exchanging data between browsers. This exchange is based on the following assumptions:
- This exchange is done using the Session Description Protocol (SDP). Each SDP will contain multimedia information about the sender.
- A user exchanges an offer to receive an answer.
- After the pair of exchanges are completed, the sender will send to the receiver information about his network and instructions on how to communicate with him using WebRTC. This network information exchange is done using ICE Protocol.
Setting up multi-peer connections
Because a demo is worth a thousand words, in this section we will set up a working example of WebRTC where the peers will exchange audio and video.
What’s our goal? We want to have an app where, as users, we can chat with others using audio and video. For this, we need our friend WebRTC and a messaging mechanism to send and receive events from the users (these events will represent the user’s “intentions”).
The messaging service is very simple, and you can use whatever you like, socket.io, gorilla, firebase, etc. Let’s start by defining a wrapper named SocketCommunication (the implementation here is not relevant for this post, but it should be rather simple).
const socket = new SocketCommunication('IP_ADDRESS');
Upon a user enters the chat, he has to ask permission from our browser to get access to audio and video. This access will be stored in a MediaStream object. Both audio and video will be stored in different tracks inside this object.
const localStream: MediaStream;....navigator.getUserMedia({ audio: true, video: true }, stream => setLocalStream(stream) , error => console.warn(error.message));
Let’s imagine that three different users want to connect to our chat application and that each user will be identified with a unique ID. The first one will send a connect event and the messaging server will respond with an event specifying that the user connected with success to the chat room. After the connection is successful the user sends a start_call event to the messaging service, but in this case, no one will receive it because there are no other users in the chat.
Now, the second user joins the chat and initiates the call. This time, the start_call will be received by the first client. The first client will have to create a RTCPeerConnection associated to this second user, in order to have direct media communication with him.
const peerConnection = new RTCPeerConnection(STUN_SERVERS);// Now add your local media stream tracks to the connection
localStream.getTracks().forEach((track: MediaStreamTrack) => { peerConnection.addTrack(track);});
After a RTC connection is created the first client has to send his RTC offer to the second client, and set the RTC local description with his created offer.
let sessionDescription: RTCSessionDescriptionInit = await peerConnection.createOffer();peerConnection.setLocalDescription(sessionDescription);
socket.send('offer', description, from_userid, to_userid);
The second user, when receiving the offer, will have to do the same thing as first client did: create a RTCPeerConnection associated with the first user. Then he’ll have to create an answer and send it to the first client. This time, the remote description will be the offer received and the local description will be the created answer.
peerConnection.setRemoteDescription(new RTCSessionDescription(offer)).then(async () => { const sessionDescription = await peerConnection.createAnswer();
peerConnection.setLocalDescription(answer)
socket.send('answer', description, from_userid, to_userid);});
The first user will then receive an answer event and will associate the second user answer description with his RTCPeerConnection object.
peerConnection.setRemoteDescription(new RTCSessionDescription(answer))
In the course of this section, the logic specified will work for any amount of users, and because of that, if the third user logs in, he will send a start_call event and the other two users will send him their offers, and the third user will answer them with his media description.
Now, every single user on your application is connected and sharing media information. If a user wants to change his media tracks (disabled/enable audio or video) he only has to toggle the track from his local stream and the others clients will receive that change automatically thanks to our TURN protocol.
localStream.getTracks().forEach((track: MediaStreamTrack) => { // audio or video
if (track.kind === 'video') { track.enabled = !track.enabled; }});
And that’s it! Hope you enjoyed the reading and it was helpful for your project.