WebRTC the magic behind communications over the internet — part 2: How it works
Hey everyone this is my second article on the topic of WebRTC, previously I discussed what WebRTC is and various strategies or architectures used to implement it. You can check it out below
Now i will be discussing on how it actually works-how the connections are established between clients.
The Basic Architecture
Before proceeding with the work let’s first see the architecture of the connection i.e. what is involved between the two peers to establish the connection.
The above image shows various components in establishing the connection between the two clients.
So let’s define each of them-
- Signaling server — A signaling server is used to manage the connection between the clients. It is used to enable the connection, negotiate the connection request, and close the connection. It uses a protocol known as ICE which collects, exchanges, and then attempts to connect a session using ICE candidates. Though there is no said standard protocol for signaling the answers and offers the commonly used ones are-
— Long-polling
— HTTP Streams
— WebSockets. - STUN server — STUN (Session Traversal Utilities for NAT) is a protocol that allows clients to discover their public IP address and the type of NAT they are behind. This information is used to establish the connection between the clients. There can be 15–20% of cases where the STUN server may fail in those cases the TURN server comes in use. This can happen due to the following issues:
— Firewall restrictions: Firewalls might block communication with STUN servers, preventing clients from discovering their public IP address.
— NAT device limitations: Some NAT devices might not be compatible with STUN or may have specific configurations that hinder STUN functionality.
— Network issues: Network congestion or outages could disrupt communication between the client and the STUN server.
— STUN server overload: If a STUN server is overloaded with requests, it might become unresponsive and fail to provide a timely response. - TURN Server — TURN ( Traversal Using Relay NAT) is a protocol for relaying network traffic. It is used when the STUN server fails. It is used to stream audio streams, video streams, and other real-time data between clients. It does not pass Signaling data. It has a public IP address through which clients can establish a connection with it. These are not public servers as these can lead to high cost generation due to the traffic that can go through them.
Other than these components some important protocols that will complete the Architecture-
- UDP — User Datagram Protocol is the underlying protocol that is used in all real-time communication. While UDP sacrifices reliability for speed, minor packet loss is preferable to delays caused by retransmission in TCP. This ensures a smoother communication experience for calls, even if it means occasionally missing a frame of video.
- SDP — Session Description Protocol acts like a menu for multimedia calls. It describes what media (audio, video) is available, how to connect (codecs, ports), and timing info, but doesn’t deliver the actual media itself. Clients exchange this info before connecting.
- ICE candidates — SDP describes the media itself, but figuring out how to connect requires more information. ICE candidates act like connection details, exchanged with SDP, that tell peers how to reach each other directly or through a TURN server if needed.
Steps to establishing a direct connection between two clients
Now that we have an idea about the basic architecture and we know about the components and protocols involved let’s see how the connection is established step-by-step.
- Step 1: WebRTC Offer
Client 1 creates a WebRTC offer using SDP (Session Description Protocol) and sends it to the Signaling Server. The Signaling Server then forwards the offer to Client 2.
- Step 2: WebRTC Answer
Client 2 receives the offer and creates a WebRTC answer using SDP. This answer is sent back to the Signaling Server, which forwards it to Client 1. Now, both clients have each other’s SDP information.
- Step 3: ICE Candidates
Both Client 1 and Client 2 use a STUN server to gather their ICE candidates (Interactive Connectivity Establishment candidates). These candidates contain information about their network addresses and ports.
- Step 4: ICE Candidate Exchange
The clients exchange their ICE candidates with each other, likely through the Signaling Server again. Negotiations might occur during this step to determine the best connection method.
- Step 5: Direct Connection (Preferred)
If the ICE candidates allow for a direct connection between the clients, a peer-to-peer connection is established. This is the preferred scenario for optimal performance.
- Step 6: TURN Server Connection (Fallback)
If a direct connection isn’t possible due to network restrictions (firewalls, NAT), the clients will attempt to connect through a TURN server. The TURN server acts as a relay, forwarding media traffic between the clients.
Now we know how the connections are made to cater to real-life connections :) ……obviously using video calls. But it may just be the beginning! WebRTC’s potential stretches far beyond what we’ve covered. I would love to hear any suggestions on what would you like to explore next.
Stay tuned for the next exploration!!!