WebRTC: Up and Real-timing. Part 1

Published in

The Andela Way

5 min readJan 17, 2018

The History

How expensive in both time and resource do we actually think messaging apps could be to build, and how difficult have some tech gods actually made this out to be? This is a question that has beaten on the minds of many and this article will introduce some concepts that might be familiar with some and novel with others as we will be discussing WebRTC whilst demystifying the instant messaging/video conferencing myths.

WebRTC is still in its infant years with some browsers still not supporting it, but the most of it which I hope the majority of the readers of this are reading right now would hopefully already have.

WebRTC support as of the time of this article

Introduction.

The internet is built on rules, rules we like to call protocols, and two of the most popular ones are Transmission Control Protocol (TCP) or the stateful protocol and the User Datagram Protocol (UDP) or the stateless protocol, which dictate the topologies through which information is structured as its passed from one computer to another.

The stateful/stateless names are very informal and I am only using them as is to create a context so please, do not eat me raw for using them.

A core characteristic of TCP is that it’s sent and delivered in-order and the states, whether successful or failed, are known, so that a complete structure is collected. An example is a browser loading images from a server. From a users perspective, the image loads gradually until completion because both the browser and the server know the structure of the complete image and thus, can load the image in bits and pieces, with no loss in packets until the complete image has been loaded. If for some reason there’s a loss in packets, the packet is resent. Thus, TCP is said to be stateful. UDP on the other hand, has no knowledge of statefulness and thus any packet lost is lost forever everrrr everrr everr ever…*fades out*, and this protocol is used in communications like realtime video conferencing, hence why when your connection breaks and recovers you just keep on from the last packet sent and delivered by the streaming server or peer.

So why are we talking about this? What does all these nonsense have to do with WebRTC? Well, to appreciate WebRTC and how it works, we will have to understand some more “nonsenses” and networking jargon.

Good programmers know how things work, great programmers know why things work. — I really can’t recall who said this.

Now all real time technology is either built on a peer-to-peer or a socket/relay bases. WebRTC is mostly peer-to-peer (p2p) so thats what we will be dealing with and its complexities starting with an example.

It’s Alice again.

Alice wants to send some data to Bob and assuming they’re both on the same network, all Alice’s computer does is locate Bob’s address on the network and the port for which Bob can receive her message and then sends this message directly to him. But what happens if Alice and Bob are on two different networks? Locating Bob’s address becomes slightly harder, so an intermediary comes into play, say Gwiggle Inc. located at www[dot]gwiggle[dot]com where both Alice and Bob can connect and share both their details, then discarding the intermediary and maintaining their connection directly with each other.

To further expand, for Alice to be able to connect directly with Bob, she needs to know her public IP and so she connects to a Session Traversal Utility using NAT’s (STUN) server or a Traversal Using Relays around NATs (TURN) server or sometimes both. What STUN’s do is specifically allow Alice query for her Public IP which she does not know of internally because of devices called Network Address Translators (NAT) which keep a table of local/private IP’s that map to public IP’s. NATs prevent against a very true problem called IPv4 address exhaustion. Now once Alice knows her public address she can now send it to Bob through Gwiggle Inc’s server, Bob does the same with his and now both Alice and Bob can connect to each other with no further aid. We will catch up on where TURN’s fit in this whole process later.

So what is WebRTC?

WebRTC, which stands for Web Real Time Communication, is a peer-to-peer, free, open source project that enables applications employ real-time, plugin-free, text, voice and video communications. Built and maintained by Google, it suggests some standards and protocols (geek terms for saying it adds capabilities to something to enable that thing do some stuff it wont ordinarily be able to do) that should be strictly adhered to when employing technology that transmits data between two points (in this case browsers) in realtime and peers are connected in a similar Alice and Bob model as described above called signalling.

Built to be primarily implemented in JavaScript (yes yes, it’s JS baby!!!), WebRTC is probably the simplest, easiest way to implement a realtime communication experience on your web (and mobile) applications and this article is poised to drop the gems on how to do so.

So without further ado…

…the dive in.

WebRTC provides a couple of API endpoints to enable browsers send and receive media in realtime so lets look at this using real world scenarios.

Alice says ‘Hi’: The call.

The first thing we would want to do to start a WebRTC session is setup STUN and TURN connections. Next, we create a peer, Alice in this instance, that will make a call to the other peer, Bob. I mean, this only makes sense since calls, like regular phone calls, can only be made by “things”.

Alice creates her peer instance to begin the process

Next, Alice will create, then send an offer to Bob, this step is basically like making a phone call. This offer is basically a Session Description of the WebRTC session that we are about to begin, and its typically handled by the WebRTCs Session Description Protocol. Also, Alice saves this description as her local session description.

Alice creates, saves and sends (makes a call) an offer to Bob

WebRTC does not dictate how you send the call, thats totally up to you the developer. When Bob receives the call, he simply adds the description sent with the call to his own Session Description as a remote description. He then creates his own Session Description, saves it and then sends that back to Alice.

Bob receives Alice’s call (data [description]), saves it, creates his and then sends (accepts the call) it to Alice, response.

When Alice receives Bob’s Session Description, she adds it to her Session Description Protocol as a remote session description.

Alice receives Bob’s response (description), saves it and can now begin sharing with him.

These steps typically mark the most basic steps and concepts when beginning a WebRTC session. The session descriptions basically contain information of how the Session Description Protocol can keep mind of the states of the sessions and what to do with them.

With these you are pretty much setup to begin the WebRTC process. In the next part, we will be working through the completion of our WebRTC application by attaching a media stream that we will be sharing between peers.