WebRTC in a Nutshell (Ep-III)

NAT, ICE, STUN, TURN

Emin DENİZ
Orion Innovation techClub
10 min readMay 4, 2022

--

In the previous article on WebRTC in a Nutshell, I tried to explain the fundamental communication concepts of WebRTC. This article will dive into network problems that we face in communication and solutions to those problems in WebRTC.

IP Communication

If you want to send a physical mail to someone, you will need his/her address and mailbox. That is how the post office can send your mail from your address to your friend.

The same principle applies to internet communication. To be able to reach any device on the internet you need to have its IP address. Most of us living with IP addresses for a long time so I won’t try to explain what it is. But If you want to learn more about it you can check this wiki.

An IP address is a unique number for internet devices. The IP address we commonly know is named IPv4 and it is in our lives since 1982. If you ever see some IP like 128.12.879.123 or 1.22.453.34 it is IPv4. Every device that is connected to the internet has an IPv4 address. But did you ever think about how many IP addresses can exist?

Decomposition of IP (https://en.wikipedia.org/wiki/IPv4)

The picture above shows a sample IPV4 address with its binary notation. The key point is there are 32 bits in a single IPv4 block. That means we can have 2³² = 4,294,967,296 IP addresses. Imagining more than 4 billion internet devices in the world was a way ahead in the '90s. But today more than 4 billion people use the internet with more than 35 trillion devices.

Wait a minute, how can we connect 35 trillion devices with 4 billion IP addresses?

The answer is NAT

NAT (Network Address Translation)

NAT enables us to create private IP networks that are not connected to the internet. Using NAT, we can create subnetworks and use fewer IP addresses connected to the internet.

Network Address Translation (NAT) Representation

In the diagram above, you can see that 5 different devices are connected to the same router in the home network. When any of those devices try to send data to the internet, they will send the packages to the router first. Then the router will send those packages to the internet. When the router sends the package to the internet it will stamp it with its IP address.

Network Address Translation (NAT) Package Transfer

GIF above shows that flow. The same approach also happens when Device1 needs to receive a package. The router knows which package needs to send which device. So we can connect multiple devices to a router that has an IP that is known by the internet (Static IP).

That is how we can connect 35 Trillion devices to the internet with 4 billion IP addresses!

If you desire to learn more about NAT please read this wiki. But for the sake of the article, I will focus on the problems that we encountered in communication.

I had an important quote in the first article of WebRTC in Nutshell.

WebRTC takes care of the media transportation in media channels, but it expects applications to handle signaling.

That means as an app developer my responsibility is to send SDP to the correct device using the Offer Answer Model. After that WebRTC starts to send media packages between peers. Let’s examine it by examples.

Communicating devices on the same network using WebRTC

In the demonstration above, Alice wants to call Bob using a WebRTC-based application and they are in the same network. Alice initiates the call and sends a call start signal to the application server using its home router. After that application server sends a call start signal to Bob’s device via the same router. That is the signaling part of the communication.

After signaling is completed by application WebRTC starts media transformation as demonstrated above. That is how WebRTC communication happens to devices on the same network. But as we all expect we don’t usually communicate with the devices on the same network.

Communicating devices on the different networks using WebRTC

Again Alice wants to call Bob in the demonstration above, but the difference is they are on different networks. From the signaling perspective, this is not a problem. Because as an app developer we implemented the platform and our application server should know what is the IP address of Bob and its router. Let’s assume that my platform depends on WebSocket for receiving calls. Bob should be connected to the application server via WebSocket to receive the call signal. Or let’s assume that my platform depends on push notifications. In such a case my application server just needs to send a call signal via push server (Firebase, APNS, etc.).

As I said, signaling in such a case is not a problem but peer-to-peer media transmission is a big problem. As you can see from the demonstration above Alice knows that she wants to send media packages to Bob. Alice knows that Bob’s IP address is 192.168.1.2. Also, Bob knows that Alice’s IP address is 192.168.1.1. But the reality is different.

When Alice wants to send data packages to 192.168.1.2 via the home router, the router can not find such IP that is connected to the internet. The same problem happens to Bob as well. He wants to send data packages to the 192.168.1.1 via a cellular network, the cellular network can not find such an IP. As a result, communication fails.

How can we solve this problem?

How can we solve this problem?

The answer is not a single verb. We need 3 abbreviations; STUN, TURN and ICE.

STUN (Session Traversal Utilities for NAT)

The full form of STUN seems to be a complex thing but believe me, it is not. The problem that we encountered in the previous example is that devices know the other parties' private IP in the sub-network, but they don’t know each other's public IP. So we need them to know their public IP and transmit these public IPs to each other.

STUN allows devices to find their public IP.

That is the only use case of STUN, it allows devices to find their IP address. STUN acts as a mirror. When a device sends a request to the STUN server, the STUN server responds to that device with its public IP.

How does STUN find public IP?

Yes, STUN is as simple as that. Using STUN both Alice and Bob can find their IPs, and using these public IPs we can communicate with them.

Media Communication With STUN

As demonstrated above after both peers know the other peers’ public IP, they can communicate. Communication with STUN is still peer-to-peer. We don’t send any packages to any server. Most of cases STUN solves the NAT problem. But in some restricted networks, it is not enough.

TURN (Traversal Using Relays around NAT)

TURN is another NAT solution that WebRTC allows us to use. In the TURN scenario, data packages are sent to TURN, and TURN sends those packages to the remote peer.

TURN solves almost all the NAT-related network problems. But TURN is not peer-to-peer. Data packages sending to TURN and TURN sends them to another peer.

TURN kills peer-to-peer communication

TURN should be the last option on WebRTC communication. Because it is not peer-to-peer TURN server bandwidth usage is really expensive. A video call can need 1 Mbit/s bandwidth. If you have a platform with 1000 active users, you will need 1Gbit/s bandwidth. If you have 1 million active users you need will need 1Tbit/s bandwidth.

You may ask, “You told us WebRTC is peer-to-peer but TURN just kill it. What is the point of using WebRTC then?”. The answer is that we are just ensuring communication for all network conditions. Most of the time TURN is not required. We need TURN for more restricted cases like Symmetric NAT. According to Google video calls that have been established by WebRTC is %86 peer-to-peer.

So far so good. But wait, how can we use STUN and TURN servers?

ICE (Interactive Connectivity Establishment)

According to Wikipedia, “Interactive Connectivity Establishment (ICE) is a technique used in computer networking to find ways for two computers to talk to each other as directly as possible in peer-to-peer networking”. As Wikipedia explains ICE allows us to find the most suitable candidates for peer-to-peer communication.

How does it decide the most suitable candidate?

The answer is easy, it tries every possible communication path between peers. In our example, Alice wants to call Bob. Let’s say Alice and Bob have 2 STUN and 3 TURN server. In such a case ICE;

  • First checks that Alice and Bob can find each other without STUN or TURN. We are calling these candidates Local (or Host) Candidates.
  • Then checks STUN servers to get his public IP. After that, ICE checks can connect peers with their private IP alongside their public IP and ports. We are calling these candidates Server Reflexive Candidates.
  • At last, ICE will check can peer communicate with TURN servers. In such a case ICE generates Relay Candidates. In case of communication can only be established by the TURN, ICE will select the most suitable TURN server for communication.
ICE Candidate Selection Logic

As described in the image above ICE always tries to choose peer-to-peer communication with the best available candidates. You just need to define STUN and TURN servers to the ICE.

I will cover how to create a WebRTC session with sample codes in future articles. But I like to show you how easy to define STUN and TURN servers, a.k.a ICEServers, using ICE. WebRTC allows us to set RTCConfiguration when we are creating the peer connection. We can set ICEServers to RTCConfiguration as in the code block below.

That is it! Our application now has STUN and TURN Support.

As you can see there is a simple difference between STUN and TURN server definition. TURN servers also have username and credential (password). As I mentioned TURN servers are expensive, so it is logical to put password protection to use it by just your application.

You can find a list of free STUN and TURN servers in this gist. By the time I wrote this article, unfortunately, I couldn’t find any working free TURN server. But STUN servers are working and you can use them for test purposes.

ICE in the SDP

The last part that I want to mention about ICE how we can see it in the SDP. Here is a sample SDP that have STUN and TURN candidates.

Let’s break it down.

  • In the SDP above you can see 8 ICE candidate. First of all each connection has one candidate for RTP and one candidate for RTCP (For detailed explanation of RTP/RTCP please see this article).
  • At top we can see the local candidates for both possible TCP and UDP connections. 192.168.0.196 is my device IP and 46243 is my available port for RTP traffic. As you may guest 56280 is my available port for RTCP traffic. Host in that line stands for host (local) candidates.
  • Then we can see the Server Reflexive (STUN) Candidates. We already saw that 192.168.0.196 is our device IP. So as you may guest 47.61.61.61 is my public IP that return from STUN request. My device can open port 36768 and my router also can open 36768 port for RTP communication. Srflx in that line stands for server reflexive (STUN) candidates.
RTP package transmission between ports
  • At last we see the Relay (STUN) candidates. In TURN case my local (host) candidates not important, thats why you don’t see it on relay lines. We already know that 47.61.61.61 is my public IP. So 237.30.30.30 is the IP of the TURN server. My router can open port 54763 and TURN can open 51472 port for RTP communication. Relay in that line stands for relay (TURN) candidates.

This is article covers one of the most complex problem on WebRTC communication and its solutions. I hope that was easy to read and enjoying for you.

Until we meet again, take good care of yourself!

For SDP Sample thanks to WebRTCHacks

--

--