Getting Started with WebRTC and Test Driven Development
A step by step guide to making WebRTC development easier, more fun, and more productive with TDD
When we first got into WebRTC, there was a large barrier to entry due to a lack of decent resources and documentation. Everything was either too complicated, depended on overly-simplified WebRTC frameworks, or so contrived that anything we learned was not useful for building real applications.
This is a shame, because WebRTC is one of the most exciting technologies out there today. We believe that this technology is undervalued, in part because it is so difficult for developers to even get started building anything meaningful with it.
The goal of this article is to demystify the process of creating software that uses WebRTC. We hope that once you have worked through this entire article, you will have an understanding of what WebRTC is and how to develop applications that use peer to peer communication.
A Quick Primer on WebRTC
Before we dive into the core of the tutorial, let’s go over some WebRTC basics if you haven’t worked with it before or would like a refresher. Feel free to skip this section if you’re already comfortable with the fundamentals of WebRTC.
WebRTC is a technology that enables data, video, and even screen-sharing using a Peer-to-Peer (P2P) connection, meaning the data doesn’t even need to go through a server. It’s a combination of protocols and APIs, with a lot of what’s going on under the hood hidden from the developer. It’s included in both Chrome and Firefox, and consists of these 3 components:
getUserMedia, which allows a web browser to access the camera and microphone and to capture media
RTCPeerConnection, which manages the peer-to-peer connection
RTCDataChannel, which allow browsers to share arbitrary data
Before a P2P connection can be made, a process called Signaling must first occur. This involves a lot of back and forth from clients to the server, with the eventual goal of each client having the other client’s IP address and all other related information regarding the data/video that that client wants to share. How this is done is not defined by WebRTC standards, so it’s up to the developer to determine how to setup signaling. This tutorial will cover signaling using Socket.io.
Consider the diagram below, but don’t get too hung up on the details at this point:
As you can see, it’s quite an involved process just to get a peer-to-peer connection going at all! Notice that the following steps are occurring one after another:
- Peer A sends an “offer” to Peer B
- Peer B responds with an “answer” back to peer “A”
- Peer A sends a “candidate” to Peer B
- Peer B sends a “candidate” back to Peer A
Let’s run down, in short order hopefully, what’s going on here. First some terminology:
‘offer’ — contains info about the client and what the client wants to send
‘answer’ — same as offer, but as a reply
‘candidate’ — short for ICE candidate, contains an IP address and port pair
‘local/remote description’ — both of these need to be set on both clients to begin a connection (contains info about yourself and the other client).
The ICE candidates tell the other client how to navigate to get to your browser on your computer. This is done behind the scenes. Some other important things to note about the diagram:
- Notice that signaling is asymmetrical, one of the clients must start it, the other must receive (for 2 way connections).
- The only part that’s missing in the diagram are the STUN and TURN servers. These servers play an important role in establishing a connection between clients who are on different local networks.
OK, this should be enough to get you started with the rest of the tutorial. As you’re working through it, come back to the diagram above a few times and use it as a reference for what is happening. There’s a lot of other resources about what’s going here in more depth that you can find online, but this should be sufficient for now.
In this tutorial, we will be creating a chat app that will enable two clients to stream video, audio, and text chat data through a peer-to-peer WebRTC connection. We will be using the cold-brew library to employ a Test-Driven Development strategy, so we highly recommend that you write the code along with us as we go through the tutorial. We really mean it.
However, if you just want to see the end result, you can clone the source code from our GitHub Repository.
We know this is a long tutorial, but we promise that coming out of it, you will have a strong knowledge of the fundamentals of WebRTC development.
Disclosure: The cold-brew library was written by the authors of this article. We hope that you enjoy using it! If you have any feedback for us about either the tutorial or the library itself, please feel free to comment here, or leave us a message on our contact form. We look forward to hearing from you!
And, without further ado…let’s do this!
We will be using mac and linux terminal commands throughout this tutorial. Sorry, Windows users…please adjust accordingly.
First create the project directory. We will be calling it “cold-brew-tutorial”.
Now we are going to want to create all the files we will be using:
touch main.js server.js index.html styles.css
touch test/chat.spec.js test/video.spec.js
For the final piece of your file structure we are going to create a package.json:
npm init -y
Now your directory should look like this
The first thing we will do is install all requisite dependencies.
First let’s install the client-side dependencies:
npm install --save jquery socket.io-client webrtc-adapter
Next the server-side dependencies:
npm install --save express socket.io
Finally our testing dependencies:
npm install --save cold-brew mocha chai selenium-webdriver supertest
Here we will be using a new library that we have been working on called cold-brew to do test driven development with WebRTC.
The final step in our setup will be to set up scripts for running tests and our server. Open up your package.json and locate scripts:
We are going to have three separate test scripts: one for chat, one for video and one to run both. And a simple start script to run our server:
With this scaffolding in place, we are ready to start writing our application.
Setting up our Server
The first thing we need to do is set up our server, which will serve the index.html file to the browser when a request is made to the ‘/’ route.
But before writing this server code, lets set up a test in our chat.spec.js:
From this point on, whenever we show a code snippet, we will put a caption indicating which file the code should go in. Notice that we are using the supertest library to make a request to the server and ensure that we get the correct response back. If you run the test now, the test should fail. This is a good thing — we have not written anything in our server yet.
npm run test-chat
Now that we have our test, lets write the actual server code. We will be creating a simple express server that serves the index.html at the ‘/’ route and statically serves all other content.
Now when we run our test script, it should pass.
npm run test-chat
Setting up our HTML
Next we will write the html for our chat application. Once again, before we do so, let’s write a test to make sure that server is giving the correct html page.
We will first need to require expect from chai:
Next, place the following inside the “describe” block, but outside the other “it” block:
Like last time, if you run this test, it will fail — which is good, because we haven’t written the html code yet.
Now let’s write a base html page for our application. We will be creating a text chat div that will contain all pieces necessary for the messenger portion of the application:
When you run the test again, you should see two green checkmarks! Awesome! These tests may seem unnecessary, but they are setting us up to not worry about potential problems that may arise later on in development.
First, as you may have noticed, we included a form element on the html page. By default, this form will reload the page if you try to type something into the text entry box and then submit it by pressing enter or clicking the button. We don’t want it to do this, so to plug in our own behavior, we will first need to prevent this default action.
Let’s write a test for this condition. First, we will need to require cold-brew and some necessary functions from selenium-webdriver:
Then, let’s add our test case into chat.spec.js:
There’s a lot going on in this test, so let’s break it down. First, inside the beforeEach hook, we use cold-brew to create a client. This creates an automated instance of Google Chrome. Then, we navigate the client to the address of the application to test (in this case, http://localhost:3000/). Next, we locate the text input box inside of the form, type “hello world”, hit ENTER, and click on the button inside the form. Finally, we check the current url of the client and expect it to be the same as the initial address. To clean up afterward, we quit the client window in the afterEach hook.
Note: If you have used Selenium Webdriver in the past, the client.executeScript call may look familiar. This is because cold-brew uses Selenium Webdriver behind the scenes, and the client object created by cold-brew responds to the Selenium Webdriver API.
When you run the test, you should see a Google Chrome window open and then quickly close again. The test should fail because the url changed from “http://localhost:3000/” to “http://localhost:3000/?” due to the fact that the page reloads when the form is submitted. However, as we mentioned before, we want to prevent this default behavior so that we can handle the submitted chat message ourselves. Let’s write the code to prevent the page from reloading when the form is submitted:
The test should pass now.
Don’t worry, we’re almost to WebRTC.
Our final pre WebRTC functionality is going to be making the message that is typed into the input form appear on the screen. Then we are good to go on WebRTC.
We will first write a test that is going to use a few new methods from the cold-brew library. This is going to located right after our last test inside of the same describe block.
This test we are typing “Hello World” into the input in the form then hitting enter. Then we are making sure that this message appears on the page.
We can do this by adding an event listener to the form button with the id of “sendMessage” inside of the document ready function.
This event listener will get the text input from the input text field, pass it into a helper function, then reset the value of the input text field to being blank.
Now let’s create a helper function under our document ready function called handleIncomingMessage.
This function simply creates a new message element with the class of message, sets its text to the passed in message text, and appends it to the chat window.
The first thing that needs to happen when creating a WebRTC application is that the server needs to keep track of when people join and leave the page. A good way to accomplish this is that when a client arrives on the page it emits a message through its signaling channel. We will be using socket.io for this.
Let’s write a test that will wait for this event to occur. This will be inside the same describe function:
This test is a little bit different than what we have done previously. We are calling the waitUntilSendSignalling function. This function waits for the emitted event that is a string in your code to occur and resolves to true when it occurs.
We are first going to have to require a few new files into our index.html:
We will never be directly using the webrtc-adapter. It normalizes the api between browsers.
So now let’s go back to our main.js and set up our first socket event for signaling:
First we will create a socket and set it to null before out document ready function:
The first line of our document ready function should use a cold brew client side function called observeSignaling that is called on a socket:
Then we will finally want to emit a socket event called “join”:
This won’t do much yet, but it will emit a “join” event to the server. Let’s set the server up to listen for that socket emit now.
Let’s first require a few new dependencies:
Now let’s change our app.listen to server.listen:
And now finally let’s set up a server-side socket directly under our require statements:
Now that the server is responding back to the client we can build out our original emit function a little more:
The first thing we will do is add an isInitiator variable underneath the socket we created earlier. We will default it to false because we actually want the second client that arrives on the page to initiate the connection.
We can now add a little more logic to the “join” event we had in our main.js:
Because we are responding on the server, we can now give this event a callback that is going to take in the response from the server. We can then check if it is the first or second person on the page to determine if that user will initiate a WebRTC connection or not. We can also add an unload event to the window object that will prepare to fire when the page loads, but it will not fire until the user closes the window.
So lets write a few lines on our server to accept that socket event that is being emitted
We can simply just decrement the number of clients when someone a user leaves the page.
Now we are actually going to add another function to our module.exports to deal with a complication that happens as we run tests. We are going to have to reset the number of clients whenever a browser closes.
Now we will have to import the resetNumClients function into our chat.spec.js file.
Signaling: Offer and Answer
In order to establish a peer-to-peer connection using WebRTC, both clients need to create an RTCPeerConnection object. Then, each client needs to obtain their Session Description, an object that indicates what kind of data they want to send to the other client through the peer-to-peer connection, which they can do by calling built-in methods of the RTCPeerConnection object. The overall process looks like this:
- One client, the “initiator”, obtains their session description, called the offer, sets it as their localDescription, and then sends the offer to the other client through the signaling channel.
- The other client receives the offer and sets it as their remoteDescription. Then, they obtain their own session description, called the answer, set it as their localDescription, and send it back to the initiator through the signaling channel.
- The initiator receives the answer and sets it as their remoteDescription.
In the image above, Amy is the initiator.
In our chat application, the first person to arrive on the page cannot be the initiator — there is no one else for them to send an offer to! Instead, the second person to arrive on the page will be the initiator. Let’s write a test to indicate this. This test will be within a completely new describe block at the bottom of the file:
In order to make this work, we need to add a few more variables and functions.
First, at the top of main.js, outside the $(document).ready function, add the following two variables:
The peerConnection variable will store a reference to the RTCPeerConnection object that we create to communicate with the other client. Typically, the SERVERS variable would contain a list of STUN and TURN servers, which are external servers that assist in the signaling process. However, we will not need them in this situation because both of our clients are running on the same local network. In the future, if you wish to continue developing with WebRTC, there are plenty of freely available STUN and TURN servers for your usage.
With that, we can (finally) write a helper function that will create the RTCPeerConnection:
Ordinarily, we would use the RTCPeerConnection constructor to create the RTCPeerConnection object. However, the cold-brew library provides the coldBrewRTC factory function, which creates and returns an RTCPeerConnection object with the added benefit that it can be observed from the external test script.
Once the RTCPeerConnection objects have been created, the initiator should send the offer to the other client. Let’s create another helper function to manage this:
Finally, inside the $(document).ready function, let’s invoke these helper functions to send the offer. However, we need to be careful to get the timing right.
The initiator should be the one sending the offer. However, the initiator doesn’t know they are the initiator until the server responds to their “join” event emission. Therefore, we need to invoke the helper functions inside the callback for the “join” event emission:
At this point, the test should pass!
Now, let’s write another test to cover the rest of the signaling process (place this inside the same describe block as the previous test):
In order to make the rest of the signaling process happen, we’ll need to modify some of what we already have in our main.js file.
First, let’s upgrade the initiateSignaling function so that the initiator will be prepared to receive the answer from the other client:
Notice that we added two new pieces to this function:
- When the initiator creates the offer, they set the offer as their local description before sending it to the other client through the socket.
- We add an event listener to the socket to handle the answer when the other client responds.
The other client (the one who isn’t the initiator) has different responsibilities: They need to be prepared to receive the offer from the initiator, and respond with an answer when the offer comes in. Let’s write another helper function to handle these responsibilities:
Notice that this is essentially the inverse of what the initiator is doing. When the offer comes in, the non-initiator sets the offer as their remote description, creates the answer, sets the answer as their local description, and then sends the answer back to the initiator.
Now, let’s add to the conditional statement inside the “join” callback, to ensure that both clients are handling their responsibilities:
The last thing we need to do to complete this part of the signaling process is to ensure that the server is passing along the offer and answer when they are sent by the clients. Let’s add to the “connection” callback in the server code to make this happen:
At this point, the test should pass! Congratulations, you have finished the first part of the signaling process.
Signalling: Ice Candidates
Aside from exchanging information about each client in their session descriptions, the two clients also need to exchange information about communication methods that they can use to reach each other. These communication methods are known as ICE Candidates and they must also be exchanged through the signaling channel.
Each candidate will be sending at least one ICE Candidate to the other. Lets right a test to make sure this happens.
Just as WebRTC becomes more complicated the further you get in the signaling process, our tests our going to become more complicated.
- The first thing that is happening is we are waiting first the first client to attach the onicecandidate event listener to the peer connection object.
- Then we wait for client 1 to send the ice candidate through the signaling channel.
- Then we have to wait for client 2 to receive the ice candidate through the signaling channel.
- Then we have to wait until client two attaches that new onicecandidate event listener to their peer connection object.
- Then we wait for client 2 to send their ice candidates through the signaling channel.
- And finally we have to wait for client 1 to receive the ice candidate.
We will do all of these in the main.js createRTC function that we created earlier.
In order to make this work, we will need edit our createRTC helper function. First we will add a socket parameter. Then we will attach the the onicecandidate event listener to the peerConnection object. When that event fires we are going to want to pass that ice candidate through the signaling channel.
Then we are going to want to write the inverse so we can also receive the ice candidates. So we will create a “receive ice candidate” event on the signaling channel that will fire whenever a client receives an ice candidate. Inside of the callback we call the addIceCandidate function to add that ice candidate to the receiving client.
Next we will have to edit the createRTC invocation we wrote earlier and add the socket in as the argument:
The next thing we will have to do is add listeners to the server so that it can pass ice candidates to through through the signaling channel to other clients:
We just add that one socket event on the our signaling channel. Every time an ice candidate is sent it broadcasts it back out to the other clients.
It seems like if we run our tests right now they should work, but they won’t. This is one of those tricky parts of WebRTC. That onicecandidate event will only fire if their is some sort of data channel or stream created. So let’s do that.
First we will add a new variable outside of the $(document).ready function with our other variables, the “dataChannel” variable.
Next inside of the “initiateSignaling function we wrote earlier, we are going to create the dataChannel.
We set that dataChannel variable to be a function called “createDataChannel” on the peerConnection. This function goes about creating whatever type of dataChannel we specify. We are also setting the channel to be unreliable. having an unreliable data channel takes much less overhead, but it does come with a few rare caveats. It is not common, but due to internet latency issues, messages will occasionally not be sent or will be sent in the incorrect order. Because this is an example application with insensitive data we are okay with that.
Now if you run that test it should pass!
We’re almost there!
So now we are going to configure the data channel and send messages.
Let’s actually remove the last few lines of code we wrote and replace them with a helper function:
Those lines of code should now become:
And now let’s write that helper function that is going to configure our data channel.
So the first thing we are going to do in this function is to create the data channel the same way we did before. Then we attach an onopen event listener to it that will fire when the data channel has opened. Inside of that event listener we will add another event listener to the data channel. The onmessage event listener. This will take a message event in as as a parameter. Then this onmessage event listener will parse the message event data and then call the “handleIncomingMessage” function that we wrote earlier with the message from the parsed data.
Next we are going to add a data channel event listener to the other client in the “prepareToReceiveOffer” function.
We are going to add the ondatachannel event to this client’s peerConnection object. Inside this event listener we are going to set the data channel to be equal to the same data channel that the initiating client is using. Now we are going to add the same onmessage event listener to this clients data channel. It will parse and handle the message in the same way as the initiating client did.
Now finally we can edit our initial event listener to create a data object that will be sent through the data channel every time a message is sent.
We now create a string of the data that we want to send with the message we want to send. And call the dataChannel.send method with the stringified message.
Try it out!!!
In your terminal type in the npm run start command we wrote at the start of the tutorial. Now open up your browser. Open up two separate tabs at “localhost:3000” and send a message!
If you did everything correctly, the message should send from one tab to the other!
You have created a WebRTC connection that can send messages through a data channel!!!
Check out part 2 if you would like to learn about how to add a video and audio stream to this application also!
And, as we mentioned above, please send us any feedback you have about this tutorial, or the cold-brew library in general! We are very proud of the work we have done on this library, but we’re always looking to make it better and more useful.
Team Cold-Brew (JC Zhang, William Barnes, Daniel King)