WebRTC for Real Time Communication
Recently, I wanted to include a video chat feature in my project. After doing some research, I decided that WebRTC was probably the best approach to accomplish what I wanted. Learning how to use WebRTC involved a lot of new technology and terms that I was unfamiliar with, so I decided to take a deeper dive to conceptually understand what I was implementing.
WebRTC stands for Web Real Time Communication. WebRTC allows users to exchange audio, video, and data in real-time by forming a peer-to-peer connection without an intermediary server. Prior to WebRTC, plugins, such as flash, or native applications were necessary for live video communication. WebRTC is free and open source and was developed by Google.
WebRTC has two main important parts to transfer data: first, it must capture streams from media devices, such as webcam and mics, then it must form a peer-to-peer connection and pass this stream over the connection. WebRTC makes use of many APIs available in the browser to accomplish this. By now, almost all modern browsers have WebRTC built-in. Just this year, WebRTC officially became standardized (4), although it was initiated in 2009 by Google, and in 2013 Google and Firefox demonstrated a video communication between their two browsers.
First, to begin a WebRTC connection with a peer, you need to get access to your camera and mic — getUserMedia can accomplish this, or getDisplayMedia if you wish to share your screen. Then, you need to form a connection with another user. RTCPeerConnection is the API used for the connection between two peers. The RTCPeerConnection object defines how the connection is set up, keeps track of the information, and closes when a user disconnects. It contains the necessary information about ICE candidates to make the connection possible.
Even though peer-to-peer connection doesn't have an intermediary server, it does require a server to initiate this connection. WebRTC uses what is known as a ‘signaling server’ or a ‘signaling channel’ to accomplish this. The initiating client who wants to start the peer-to-peer connection creates an “offer” to send to the other peer through the signaling server. Then, the other client receives the offer from the signaling server and returns an answer to the initiating peer through the signaling server. The initiating peer now has the configuration details of themselves and also the peer they wish to communicate with — this is known as the ICE candidate, and it includes the details of how the peers can communicate directly, or if a peer to peer connection is not possible, then it has the details to initiate a connection through what is known as a TURN server.
The Network Address Translation (NAT) also is another hurdle to overcome for peer-to-peer communication. Session Traversal Utilities for NAT (STUN) servers are used to get over this difficulty; if a private IP address of a computer is different than the public IP address of the router, then the STUN server is used to give each peer the information about their own respective public IP addresses. To build the details for the Internet Connectivity Establishment (ICE candidates), a browser makes a request to the STUN server to get the information about their public IP. After this is known, then a Session Description Protocol (SDP) can be generated. In addition to which IP and ports are needed to connect the two peers, the SDP also contains information about what type of data is being sent and what codecs are used. Codecs compress and decompress files; an encoder compresses the file and a decoder decompresses a file.

Peer-to-peer connection is not always possible or easy due to restrictions and firewalls. Sometimes, even after peers attempt to start a connection, one peer may have a firewall that denies peer-to-peer connections. A common restriction is known as Symmetric NAT, where a peer may only connect with another peer if a previous connection has already taken place. If this is the case, a Traversal Using Relays around NAT, known as a TURN server, must be utilized. This is an alternative to a true peer-to-peer connection, and an intermediary server is used for the data to be passed back and forth. The obvious drawback to using a TURN server is that it is more overhead and slower. If a direct connection is possible, then it should be used since the data flow will be significantly faster.
Last year, the pandemic made video conferencing an important and necessary norm in our society. Now that WebRTC is standardized and real-time communication is an expectation, WebRTC is only going to grow. New codecs are making data transfer faster than ever. WebRTC can be used also be used on native android and iOS apps. Collaborators on WebRTC are working to add more features and optimize the current features available with WebRTC, including end-to-end encryption for server-mediated video conferencing and processing of audio and video feeds (7).
Further reading on WebRTC:
(2) https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API
(3) https://bloggeek.me/what-is-webrtc/
(4) https://sdtimes.com/softwaredev/the-w3c-and-ietf-make-webrtc-an-official-standard/
(5) https://webrtc-security.github.io/
(6) https://www.twilio.com/docs/stun-turn/faq
(7) https://www.w3.org/2021/01/pressrelease-webrtc-rec.html.en