WebRTC in a Nutshell (Ep-II)
RTP/RTCP, SDP & Offer-Answer Model
In the first article on WebRTC in a Nutshell series, I tried to explain what WebRTC is and why it is so popular. In this article, let’s begin to explore some fundamental concepts of WebRTC.
Most of the tutorials I saw about WebRTC started to explain the fundamental APIs of WebRTC. Then they will try to explain what the ‘Offer Answer Model’ is and what is ‘RTP/RTCP’. This approach is also fine but when I begin to learn WebRTC, about 7 years ago, I feel like something is missing on my learning journey. So in this series, I prefer to explain ‘RTP/RTCP’, ‘ SDP’, and ‘Offer Answer Model’ before the APIs.
Note: To be able to easily explain some concepts, I will give you call examples after this part of the article. But the same principles apply to all data transfer scenarios on WebRTC.
I am assuming that most of you already know transport protocols like TCP and UDP. TCP is the protocol you prefer to use when you want to guarantee to transmit intact data (Ex: mail) and UDP is the one you prefer the speed of the transmission (Ex: YouTube video).
Real-time Transport Protocol(RTP) is, again, a network protocol for delivering audio and video over IP networks. RTP is used extensively in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications, television services, and web-based push-to-talk features. As a telecommunication standard, WebRTC is using RTP to transmit real-time data.
RTP Control Protocol (RTCP) is a brother protocol of the Real-time Transport Protocol (RTP). RTCP provides out-of-band statistics and control information for an RTP session. Using RTCP, you can get information about how successfully your data transmission is. You can get a lot of information using RTCP like, ‘How many packet losses occurred on transmission’, ‘What is the packet delay’ or ‘What is the resolution of the video call’. RTCP is important if you want to answer the question of ‘What is my call quality?’
Transmission of RTP and RTCP packets happens on Media Channel. As I explained in the first article, WebRTC is taking care of media transmission on the media channel. As an app developer, your responsibility is to manage the Signalling Channel. So you don’t usually know these concepts and most of the time you don’t need them. But I think it is important to understand RTP/RTCP before we started SDP and Offer-Answer Model. If you prefer to read details of RTP/RTCP, please see this RFC Document.
SDP (Session Description Protocol)
In the real life, you will share your contact information (email, phone number, Instagram account, home address, etc.) with people if you want them to reach you. The easiest way to share such information is to give them your business card. Using the sample card below, I can tell the people “You can send me an e-mail, call me or visit me at my home. But don’t forget that I only know English and Turkish. If you don’t know those languages, we probably have a communication problem.”
To be able to initiate a call, we also need to have a digital business card that holds the contact information of the users. That digital business card may contain;
- Caller and callee IP addresses
- Which media types do both peer support (Audio, video, screen share, etc.)
- Which of those media types are currently enabled or disabled (Video on/off hold/unHold etc.)
- Which codec types both peer supports
In the telecommunication world, we called this digital business card Session Description Protocol (SDP). SDP contains the required information to peers talk to each other.
WebRTC also uses SDP as a communication standard to initiate a call. SDP is just a text that can be parsed and manipulated by endpoints. That gives us the flexibility to manipulate call options as user actions. For example, If a user wants to hold the call, you can disable the video and audio stream by manipulating SDP as an application. Or your system requires having specific video codecs, let's say H.264, you can just delete any other codecs than H.264.
That’s the power of SDP, it is easy to manipulate to your requirements.
Here is an SDP sample that shows you what is it looking like.
Let’s see what does these line means one by one.
o=alice 2890844526 2890844526 IN IP4 10.48.1.2
O= indicates the originator of the call, session ID, and IP address of the originator.
t= indicates session ending time. If it is 0 that means session not bounded by a time
m=audio 49170 UDP/TLS/RTP/SAVPF 111 0
m= indicates media line, which is media attributes that can be existing in the session. In this case, it indicates the audio media line. This line also contains transport protocols that will be used in the session (UDP/TLS/RTP/SAVPF). Lastly, this line contains codec payload numbers that will be used in the session (111, 0). We will see what those numbers meant in the attribute lines below.
c=IN IP4 217.345.789.123
c= indicates connection information, such as the IP address of the remote device that you want to call.
a= indicates attribute lines. It defines session’s and media line attributes. In the first line,
a=sendrcv attribute indicates that the device is willing to send and receive media for audio. There can be other values such as recvonly, sendonly, or inactive which are used to implement different scenarios like hold or video-off.
Rtpmap attribute indicates maps of audio codec numbers. In this case, 111 maps to Opus with 48,000 bps bandwidth, and 0 maps to PCMU codec with 8,000 bps bandwidth. There can be more attribute lines in a standard SDP.
m=video 51372 UDP/TLS/RTP/SAVPF 98 100
m= again indicates media line. In this case, it indicates the video media line. Again, it contains transport protocols and codec payload numbers.
a= again indicates attribute lines. In the first line,
a=sendrcv attribute indicates that the device is willing to send and receive media for video.
After that, we are seeing rtpmap values. In this case, 98 maps to VP9 video codec with 90,000 bps bandwidth and 100 maps to H.264 video codec with 90,000 bps bandwidth.
SDP has a lot of attributes that I can not explain in a single article. If you want to see other SDP parameters and what are their purposes, you can read the RFC document for SDP.
So far, I explained ‘How WebRTC transmits data in Media Channel?’ (RTP/RTCP) and ‘How we can specify session properties as we want in Signalling Channel?’ (SDP). Let’s answer the question of ‘How should applications transmit session properties (SDP) to each other?’
You might have a magnificent business card, but if you don’t give it to anyone, it is useless. This rule also applies to SDP as well. We need to exchange SDP between peers to initiate a call. Offer-Answer Model is the SDP exchange procedure we use in WebRTC as a telecommunication standard. The exchange method is a decision of application. An application can send it through HTTP/HTTPS request, over a web socket, using push notification, etc. That is totally up to the application.
As the name implies, in this model there is an Offerer and there is an Answerer. The offerer is the one who starts the signaling procedure. Such as starting a new outgoing call or sending mid-call events like hold, and video on-off. The answerer is the one who answers the incoming offer. Such as answering incoming call or sending suitable answers to mid-call events.
Offer-Answer Model has 4 fundamental steps;
- Offerer creates an Offer SDP and sends it to the remote peer.
- The answerer receives the SDP of the offerer, and it sets itself.
- Answerer creates an Answer SDP and sends it to the offerer
- The offerer receives the SDP of the answerer, and it sets itself.
After that, if everything is ok, the call starts.
We talk about a lot of telecommunication concepts, so far. Let’s see how a WebRTC application should use those concepts. The diagram below shows an SDP exchange procedure on WebRTC using Offer-Answer Model.
Let’s examine those steps one by one.
- Peer-1 should get users' media and then create an
PeerConnectionobject from WebRTC. (I will explain how we can do it in the upcoming article)
- After peer connection is created, the application should call
createOfferAPI of WebRTC.
- WebRTC creates an offer SDP and gives it to the application. After this step, the application has the offer SDP and the app can manipulate the SDP if it wants.
- The application should set Offer SDP back to WebRTC.
- The application should send an offer SDP to Peer-2.
- The application on the Peer-2 receives the offer SDP. Peer-2 should get user media and create
PeerConnectionobject, if not created so far.
- The application on the Peer-2 sets the offer SDP to WebRTC.
- The application on the Peer-2 generates answer SDP using
createAnswerAPI of WebRTC.
- WebRTC creates an answer SDP and gives it to the application. After this step, the application has the answer SDP and the app can manipulate the SDP if it wants.
- The application should set Answer SDP back to WebRTC.
- The application on Peer-2 should send the answer SDP to Peer-1.
- The application on the Peer-1 sets answer SDP.
- If everything is ok, the RTP media stream starts on the media channel by WebRTC.
There might be a lot of steps above but most of them are repetitive tasks as you can see.
This is the answer to “How do we create a WebRTC session?”
If you prefer to read the third article here is the link;