Ways to stream audio & video between devices

ArtemHryhorov
Appus Studio
Published in
6 min readJan 5, 2021

This article will focus on video calls, how they work under the hood in general(some examples of apps you can find in our site), and will also focus on comparison different APIs and SDKs. Let’s get start it.

  1. How voice and video calls works?

2. Common protocols.

3. WebRTC — main technology.

4. Libraries, SDKs and APIs.

1. How voice and video calls works?

To connect two devices to each other we need 5 things: TURN and STUN servers, singling and of course 2 clients. Let’s talk about it in more details.

A. Signaling

In few words — the purpose of signaling is to inform that that these two clients want to connect to each other for the call and will exchange the information in future. I will not describe the ways of organization of signaling, but in most cases WebSocket is used for this. After initial signaling, we need to build peer to peer connection(P2P, it means that both devices have equal permissions and responsibilities for processing data). And finally, of course, we need to have the public IP address of both the clients for connections.

B. STUN Server

STUN literally means Simple Traversal of UDP tрrough NAT. Is used to get the public IP address. The NAT(Network Address Translation) provides the local IP address of the device which can’t be used publicly to connect peer to peer. And for WebRTC, we need to have the public IP address. STUN Server provides that. So what’s wrong? STUN protocol works very badly then server and client are behind NAT, so because of this we use TURN

C. TURN Server

Let’s starts again with defining abbreviation: TURN stands for Traversal Using Relays around NAT. TURN solves the problem that STUN can’t — works with symmetric NAT (both server and client are behind NAT). It can be compared to a mediator. In simple words, to solve the problem, it splits a symmetric NAT into two non-symmetric. So, basically we can say, that TURN is additional functionality of STUN server. Pay attention, that signaling methods and protocols are not part of WebRTC! Why is this happening and what is WebRTC in general, we will talk later.

2. Common protocols for video calls

In this section I want to describe two main protocols. We will talk about their purpose, advantages and disadvantages. So, let’s start with SIP.

SIP(Session Initiation Protocol) — signaling protocol that used for full maintaining of data connection. SIP can allows users to establish data connections, modify and close them when finished. SIP is available for initialization, modifying and closing real-time session. Also you can use a SIP for a lot of type of connection such us chatting, video calling and file sharing.

Pros & cons:

+ Diversity — it can support users with different capabilities(only video, only sound, etc.)

+ Independence — SIP can be used in one-way or both-way directions.

+ Standardisation — it is an open standard that ensure support from provider

+ Flexibility — SIP works independently of the type of session, or the media used.

- Higher load — increased load on gateways due to expensive message processing

VoIP(Voice over Internet Protocol) or IP telephony — it is a protocol that allows make voice calls. VoIP converts your voice to small pieces of data. Then, the over side gets this packages end decodes it back to voice.

Pros & cons:

+ Portability — VoIP has a number, also known like virtual number, and it number is completely portataible

+ Quality — рas a reasonably high voice quality

+ Cost — less than the average landline telephone system

- Location tracking — unfortunately VoIP does not support location tracking. It can be useful in some emergency situation. There are a lot of other protocols that are very useful for calling, but to describe them, you need to highlight at least two more articles.

Here is list of another common and used protocols:

* SDP(Session Description Protocol)

* RTP(Real-time Transport Protocol)

* PSTN(Public Switched Telephone Network)

* ICE(Interactive Connectivity Establishment)

3. WebRTC — main technology

So, at last, we reach part with most important technology for video/audio calling — WebRTC.

WebRTC(Real-time connection) it is open-source project, which is a collection of standards, protocols and JavaScript API, the combination of which provides peer-to-peer audio, video and data sharing between peers in web browsers and mobile applications. WebRTC is a complicated technology which uses a lot of protocols in itself. So, lets quickly discuss about WebRTC protocols:

Necessary to establish and maintain a peer-to-peer connection(we talked about this type of protocols before):

• ICE

• STUN

• TURN

Also WebRTC carry about security, so for secure all data transfers between peers used DTLS(Datagram Transport Layer Security) and additional to this transport layer protocol UDP(User Datagram Protocol) is used — for latency and timeliness. For multiplexing different streams are used these protocols:

• SCTP(Stream Control Transmission Protocol)

• SRTP(Secure Real-time Transport Protocol)

But! It is important to keep in mind that even with optimization and compression there are some significant restrictions:

• An HD quality streams requires good bandwidth (about 1–2 Mbps)

• An HD stream requires near a 3.5G+ connection

4. Libraries, SDKs and APIs

If you are developer, previous part of this article, I hope, was interesting, but this part will more useful. We will discuss which way we can establish audio/video call. There will not any code example, because the article are already quite large in size, but we talk over some SDK and API which you can use in your production. Firstly, it is worth nothing that most of them are paid and almost all of them using WebRTC like basic construction. Here the list of the most useful API and SDK by my opinion:

• QuickBlox

• OpenTok

• Agora

QuickBlox — it is an API that provides instant messaging API that allow chat and calling functionality to be added to any mobile or Web application. Very nice that QuickBlox provide you iOS, Android, JavaScript, React Native and Flutter SDKs. What about paid plan? QuickBlox can provide you Free plan with 500 available users and file size limit of 10 MB. There are a lot of paid plan, so you can chose your own. We prefer use it in our company.

OpenTok — is the JS library that lets you use Vonage Video API-powered video sessions on the web. As the official documentation says: “All applications that use the Vonage Video API are composed of two parts: the client side, which uses the OpenTok client SDKs and runs in a user’s browser or mobile app and the server side, which uses the OpenTok server SDKs and runs on your server to pass authentication information to the client”. Client SDKs are also available for IOS and Android. Unfortunately OpenTok does not has Free plan, only 9.99$ per month.

Agora — is the Real-Time Engagement Platform for audio and video connections. Agora provides the building blocks to enable a wide range of real-time engagement possibilities, also it has convenient Android SDK. You can use Agora free for first 10000 minutes every month. There are a lot of other API and SDK, but they are very similar and differ mainly in additional functionality. Also, you can find another useful articles in our blog.

--

--