Getting Started with WebRTC: A Practical Guide with Example Code

Alex Liu
16 min readJul 20, 2023

--

WebRTC example app

WebRTC (Web Real-Time Communication) is a powerful technology that enables real-time audio, video, and data sharing directly between web browsers and mobile applications. Whether you’re building a video conferencing app, a live streaming platform, or interactive web applications, WebRTC has become a game-changer in the world of communication.

In this blog, we’ll embark on a journey to learn WebRTC from scratch, exploring its core concepts, understanding the architecture, and diving into some hands-on coding examples. By the end, you’ll have the foundational knowledge to build your own WebRTC-powered applications.

Understanding the Basics of WebRTC

What is WebRTC

WebRTC (Web Real-Time Communication) is a collection of open-source technologies that enable real-time communication over the internet directly between web browsers and mobile applications. It allows for peer-to-peer audio, video, and data sharing without the need for any plugins or additional software. WebRTC is widely used for building applications such as video conferencing, voice calling, live streaming, online gaming, and more.

WebRTC peer to peer connection

How WebRTC Works

  1. Media Capture: WebRTC allows web browsers and mobile apps to access the user’s media devices, such as cameras and microphones, to capture audio and video streams.
  2. Signaling: Before two peers can communicate, they need to establish a connection. The signaling process involves exchanging metadata and control messages between the peers to negotiate the session setup and handle network specifics.
  3. Peer Connection: Once the signaling process is completed, a direct peer-to-peer connection is established between the two devices. WebRTC uses a secure and efficient peer connection protocol to transmit audio, video, and data streams between them.
  4. Codecs and Encryption: WebRTC supports a range of audio and video codecs to efficiently encode and transmit media streams. Additionally, it employs encryption to secure communication between peers, ensuring privacy and data integrity.
  5. NAT and Firewall Traversal: WebRTC is designed to work across different networks and handle Network Address Translators (NAT) and firewalls. It uses techniques like Interactive Connectivity Establishment (ICE) to discover and establish direct communication paths.
  6. Data Channel: In addition to audio and video streams, WebRTC also includes a Data Channel that allows peers to exchange non-media data directly, enabling real-time data sharing.

WebRTC is supported by major web browsers, including Google Chrome, Mozilla Firefox, Safari, and Microsoft Edge. Its adoption has been driven by its open-source nature, ease of implementation, and the ability to build seamless real-time communication applications without the need for third-party plugins.

Get to know the necessary WebRTC APIs

To work with WebRTC (Web Real-Time Communication), you need to familiarize yourself with the necessary APIs and libraries that facilitate real-time communication between web browsers. WebRTC enables peer-to-peer audio, video, and data streaming directly within web applications, making it ideal for building video conferencing, voice calling, and other real-time communication features. Below are the essential components you should get to know:

getUserMedia API

This API enables access to the user’s media devices (camera and microphone) and provides MediaStream objects that can be used with RTCPeerConnection.

RTCPeerConnection API

This API is the heart of WebRTC and is responsible for establishing and managing peer-to-peer connections between browsers. It handles ICE (Interactive Connectivity Establishment) negotiation, NAT traversal, and media stream transmission.

RTCDataChannel API

This API provides peer-to-peer data communication capabilities without the need for a server. It is useful for sending arbitrary data between peers.

Signaling

WebRTC requires signaling to exchange connection details between peers before establishing a direct connection. This process is not defined by the WebRTC standard and requires a separate signaling mechanism, such as WebSocket or a server-side application.

Other WebRTC APIs

  • MediaStream: Allows access to audio and video streams from user media devices like cameras and microphones.
  • RTCIceCandidate: Represents an ICE candidate, used during peer-to-peer connection establishment.
  • RTCSessionDescription: Represents the session description that sets up the connection.

What are STUN, TURN, and ICE?

ICE, STUN and TURN

ICE (Interactive Connectivity Establishment), STUN (Session Traversal Utilities for NAT), and TURN (Traversal Using Relays around NAT) are important components of the WebRTC framework that enable real-time communication over the internet. They are used to establish peer-to-peer connections between clients, even when they are located behind firewalls or Network Address Translation (NAT) devices.

ICE (Interactive Connectivity Establishment) ICE is a technique that combines STUN and TURN servers to discover and establish the best connection path between WebRTC clients, enabling real-time communication even in challenging network environments.

STUN (Session Traversal Utilities for NAT) STUN is a protocol used to discover the public IP address and port of a client that is located ICE is a technique that combines STUN and TURN servers to discover and establish the best connection path between WebRTC clients, enabling real-time communication even in challenging network environments.

TURN (Traversal Using Relays around NAT) TURN servers act as intermediaries when direct peer-to-peer connections are not possible due to restrictive network configurations. They relay media streams between clients, ensuring reliable communication.

Setting Up the Development Environment

Let’s create a simple web page with React

First, make sure you have Node.js installed on your machine. Then, open your terminal or command prompt and run the following command to create a new React app:

npx create-react-app simple-webrtc

Next, navigate to the project directory and start the web server

cd simple-webrtc
npm start

Then, open the project in your code editor. You’ll find the main code files in the src folder. You can edit App.js to modify the content of the web page.

import React from 'react';
import './App.css';

function App() {
return (
<div className="App">
<h1>Welcome to My Simple Web Page</h1>
<p>This is a basic web page built with React.</p>
</div>
);
}
export default App;

In my web page example, I will use the Ant Design as the UI library to make my life easier. After editing, my react page will look like this:

import React from 'react';
import {Button, Typography, Input} from 'antd';
import '../App.css';

const {Title, Paragraph} = Typography;
const {TextArea} = Input;
function App() {
const renderHelper = () => {
return (
<div className="wrapper">
<Input
placeholder="User ID"
style={{width: 240, marginTop: 16}}
/>
<Input
placeholder="Channel Name"
style={{width: 240, marginTop: 16}}
/>
<Button
style={{width: 240, marginTop: 16}}
type="primary"
>
Call
</Button>
<Button
danger
style={{width: 240, marginTop: 16}}
type="primary"
>
Hangup
</Button>
</div>
);
};

const renderTextarea = () => {
return (
<div className="wrapper">
<TextArea
style={{width: 240, marginTop: 16}}
placeholder='Send message'
/>
<TextArea
style={{width: 240, marginTop: 16}}
placeholder='Receive message'
disabled
/>
<Button
style={{width: 240, marginTop: 16}}
type="primary"
disabled={sendButtonDisabled}
>
Send Message
</Button>
</div>
);
};

return (
<div className="App">
<div className="App-header">
<Title>WebRTC</Title>
<Paragraph>This is a simple demo app that demonstrates how to build a WebRTC application from scratch, including a signaling server. It serves as a step-by-step guide to help you understand the process of implementing WebRTC in your own projects.</Paragraph>
<div className='wrapper-row' style={{justifyContent: 'space-evenly', width: '50%'}}>
{renderHelper()}
{renderTextarea()}
</div>
<div
className='playerContainer'
id="playerContainer"
>
<video
id="peerPlayer"
autoPlay
style={{width: 640, height: 480}}
/>
<video
id="localPlayer"
autoPlay
style={{width: 640, height: 480}}
/>
</div>
</div>
</div>
);
}
export default App;

Now, we have successfully created a basic web page for WebRTC.

Simple web page for WebRTC

Building a Basic WebRTC Video Call

Step 1: Setting up the local media stream (camera and microphone).

let localStream;

const setupDevice = () => {
console.log('setupDevice invoked');
navigator.getUserMedia({ audio: true, video: true }, (stream) => {
// render local stream on DOM
const localPlayer = document.getElementById('localPlayer');
localPlayer.srcObject = stream;
localStream = stream;
}, (error) => {
console.error('getUserMedia error:', error);
});
};

Handling media streams and constraints in WebRTC is crucial for controlling the audio and video behavior during real-time communication. You can specify constraints when requesting media from the user, such as resolution, frame rate, or specific devices. Constraints help tailor the media capture to meet specific requirements.

const constraints = {
video: {
width: { ideal: 1280 },
height: { ideal: 720 },
frameRate: { ideal: 30 },
},
audio: true,
};

navigator.mediaDevices.getUserMedia(constraints)
.then((stream) => {
// Handle the media stream as needed.
})
.catch((error) => {
// Handle the error if constraints cannot be satisfied.
});

Step 2: Establishing the RTCPeerConnection.

const servers;
const pcConstraints = {
'optional': [
{'DtlsSrtpKeyAgreement': true},
],
};

// When user clicks call button, we will create the p2p connection with RTCPeerConnection
const callOnClick = () => {
console.log('callOnClick invoked');
if (localStream.getVideoTracks().length > 0) {
console.log(`Using video device: ${localStream.getVideoTracks()[0].label}`);
}
if (localStream.getAudioTracks().length > 0) {
console.log(`Using audio device: ${localStream.getAudioTracks()[0].label}`);
}
localPeerConnection = new RTCPeerConnection(servers, pcConstraints);
localPeerConnection.onicecandidate = gotLocalIceCandidateOffer;
localPeerConnection.onaddstream = gotRemoteStream;
localPeerConnection.addStream(localStream);
localPeerConnection.createOffer().then(gotLocalDescription);
};
// async function to handle offer sdp
const gotLocalDescription = (offer) => {
console.log('gotLocalDescription invoked:', offer);
localPeerConnection.setLocalDescription(offer);
};
// async function to handle received remote stream
const gotRemoteStream = (event) => {
console.log('gotRemoteStream invoked');
const remotePlayer = document.getElementById('peerPlayer');
remotePlayer.srcObject = event.stream;
};
// async function to handle ice candidates
const gotLocalIceCandidateOffer = (event) => {
console.log('gotLocalIceCandidateOffer invoked', event.candidate, localPeerConnection.localDescription);
// when gathering candidate finished, send complete sdp
if (!event.candidate) {
const offer = localPeerConnection.localDescription;
// send offer sdp to signaling server via websocket
sendWsMessage('send_offer', {
channelName,
userId,
sdp: offer,
});
}
};

Handling media streams and constraints in WebRTC is crucial for controlling the audio and video behavior during real-time communication. Here’s a brief overview of how to manage media streams and constraints:

We handle the ICE candidate using the gotLocalIceCandidateOffer function. When the gathering of candidates is completed, we send the complete SDP via signaling. If the event.candidate is null, it indicates that the ICE candidate gathering is ready. There are two ways to handle ICE candidates: one is to insert the ICE candidates into the SDP and send them all together, and the other is to send each ICE candidate to the remote user via signaling. The remote user then sets it in their local peer connection.

At this stage, we have completed setting up the RTCPeerConnection and generating an offer SDP. However, in order to establish a connection with the remote browser, we require a signaling server to exchange the SDP.

Implementing Signaling Server

The signaling server plays a crucial role in WebRTC communication. It facilitates the exchange of session information (SDP) between peers, allowing them to establish a direct peer-to-peer connection. The signaling process involves sending the SDP offer generated by the local browser to the remote browser, and vice versa.

Signaling Server

Once the signaling server receives the SDP offer from the local browser, it forwards it to the remote browser. The remote browser then generates its SDP answer and sends it back through the signaling server to the local browser.

This exchange of SDP offers and answers enables both browsers to negotiate the parameters for the media stream, such as codecs, supported resolutions, and other settings required for successful peer-to-peer communication.

The signaling server doesn’t transmit the actual media streams; it solely acts as a messenger to exchange the SDP between peers. Once the SDP exchange is complete, the media streams are transmitted directly between the peers, creating a direct and secure connection for real-time communication.

Remember, you can implement the signaling server using various technologies, such as WebSockets, HTTP, or any other suitable communication protocol. The choice of signaling server technology depends on the specific requirements of your WebRTC application.

Establish a NodeJs server with Express.js

const debug = require('debug')(`${process.env.APPNAME}:index`);
const app = require('express')();
const server = require('http').Server(app);
const wss = require ('./wss');

const HTTPPORT = 4000;
const WSSPORT = 8090;
// init the websocket server on 8090
wss.init(WSSPORT)
// init the http server on 4000
server.listen(HTTPPORT, () => {
debug(`${process.env.APPNAME} is running on port: ${HTTPPORT}`);
});

WebSocket on NodeJs

const debug = require('debug')(`${process.env.APPNAME}:wss`);
const WebSocket = require('ws');
let channels = {}

function init (port) {
debug('ws init invoked, port:', port)
const wss = new WebSocket.Server({ port });
wss.on('connection', (socket) => {
debug('A client has connected!');

socket.on('error', debug);
socket.on('message', message => onMessage(wss, socket, message));
socket.on('close', message => onClose(wss, socket, message));
})
}
function send(wsClient, type, body) {
debug('ws send', body);
wsClient.send(JSON.stringify({
type,
body,
}))
}
function clearClient(wss, socket) {
// clear client by channel name and user id
Object.keys(channels).forEach((cname) => {
Object.keys(channels[cname]).forEach((uid) => {
if (channels[cname][uid] === socket) {
delete channels[cname][uid]
}
})
})
}
function onMessage(wss, socket, message) {
debug(`onMessage ${message}`);
const parsedMessage = JSON.parse(message)
const type = parsedMessage.type
const body = parsedMessage.body
const channelName = body.channelName
const userId = body.userId

switch (type) {
case 'join': {
// join channel
if (channels[channelName]) {
channels[channelName][userId] = socket
} else {
channels[channelName] = {}
channels[channelName][userId] = socket
}
const userIds = Object.keys(channels[channelName])
send(socket, 'joined', userIds)
break;
}
case 'quit': {
// quit channel
if (channels[channelName]) {
channels[channelName][userId] = null
const userIds = Object.keys(channels[channelName])
if (userIds.length === 0) {
delete channels[channelName]
}
}
break;
}
case 'send_offer': {
// exchange sdp to peer
const sdp = body.sdp
let userIds = Object.keys(channels[channelName])
userIds.forEach(id => {
if (userId.toString() !== id.toString()) {
const wsClient = channels[channelName][id]
send(wsClient, 'offer_sdp_received', sdp)
}
})
break;
}
case 'send_answer': {
// exchange sdp to peer
const sdp = body.sdp
let userIds = Object.keys(channels[channelName])
userIds.forEach(id => {
if (userId.toString() !== id.toString()) {
const wsClient = channels[channelName][id]
send(wsClient, 'answer_sdp_received', sdp)
}
})
break;
}
case 'send_ice_candidate': {
const candidate = body.candidate
let userIds = Object.keys(channels[channelName])
userIds.forEach(id => {
if (userId.toString() !== id.toString()) {
const wsClient = channels[channelName][id]
send(wsClient, 'ice_candidate_received', candidate)
}
})
}
default:
break;
}
}
function onClose(wss, socket, message) {
debug('onClose', message);
clearClient(wss, socket)
}

WebSocket on React

import React, {useRef} from 'react';
import {useEffect} from 'react';

const URL_WEB_SOCKET = 'ws://localhost:8090/ws';
function App() {
const ws = useRef(null);
useEffect(() => {
const wsClient = new WebSocket(URL_WEB_SOCKET);
wsClient.onopen = () => {
console.log('ws opened');
ws.current = wsClient;
// setup camera and join channel after ws opened
join();
setupDevice();
};
wsClient.onclose = () => console.log('ws closed');
wsClient.onmessage = (message) => {
console.log('ws message received', message.data);
const parsedMessage = JSON.parse(message.data);
switch (parsedMessage.type) {
case 'joined': {
const body = parsedMessage.body;
console.log('users in this channel', body);
break;
}
case 'offer_sdp_received': {
const offer = parsedMessage.body;
onAnswer(offer);
break;
}
case 'answer_sdp_received': {
gotRemoteDescription(parsedMessage.body);
break;
}
case 'quit': {
break;
}
default:
break;
}
};
return () => {
wsClient.close();
};
}, []);
const sendWsMessage = (type, body) => {
console.log('sendWsMessage invoked', type, body);
ws.current.send(JSON.stringify({
type,
body,
}));
};
}

Be cautious when using const ws = useRef(null) and consider why not simply using wsClient = new WebSocket(URL_WEB_SOCKET) directly. React Hooks behave differently, and the ws variable will be reset every time the page re-renders. To ensure the WebSocket connection persists across renders, we can use the useRef hook. This is similar to using instance variables on a class, and it remains unaffected by re-renders, unlike the useState hook.

By using useRef, we can maintain a stable reference to the WebSocket instance throughout the component's lifecycle. This allows us to manage the WebSocket connection effectively without being affected by rendering updates. Remember that useRef is mainly used for handling mutable values that persist across renders, making it an ideal choice for managing WebSocket connections in React components.

With the signaling server in place, your WebRTC application will be able to establish connections and enable seamless audio and video communication between remote peers.

Finish the Callee Part

Now, we almost reached the final part of our full WebRTC application, where we need to handle the answering logic when the remote user receives a call from their peer. The process is similar to before, but this time we will generate an answer SDP and return it to the caller through the signaling server.

const onAnswer = (offer) => {
console.log('onAnswer invoked');
setCallButtonDisabled(true);
setHangupButtonDisabled(false);

if (localStream.getVideoTracks().length > 0) {
console.log(`Using video device: ${localStream.getVideoTracks()[0].label}`);
}
if (localStream.getAudioTracks().length > 0) {
console.log(`Using audio device: ${localStream.getAudioTracks()[0].label}`);
}
localPeerConnection = new RTCPeerConnection(servers, pcConstraints);
localPeerConnection.onicecandidate = gotLocalIceCandidateAnswer;
localPeerConnection.onaddstream = gotRemoteStream;
localPeerConnection.addStream(localStream);
localPeerConnection.setRemoteDescription(offer);
localPeerConnection.createAnswer().then(gotAnswerDescription);
};
const gotRemoteStream = (event) => {
console.log('gotRemoteStream invoked');
const remotePlayer = document.getElementById('peerPlayer');
remotePlayer.srcObject = event.stream;
};
const gotAnswerDescription = (answer) => {
console.log('gotAnswerDescription invoked:', answer);
localPeerConnection.setLocalDescription(answer);
};

const gotLocalIceCandidateAnswer = (event) => {
console.log('gotLocalIceCandidateAnswer invoked', event.candidate, localPeerConnection.localDescription);
// gathering candidate finished, send complete sdp
if (!event.candidate) {
const answer = localPeerConnection.localDescription;
sendWsMessage('send_answer', {
channelName,
userId,
sdp: answer,
});
}
};
}

Start Live Streaming

At last, we have successfully completed the intricate procedure of setting up WebRTC. Now, let’s start the web app by running npm start and open two web pages—one for the caller and the other for the callee. Click the Call button on the caller's page, and the live streaming via WebRTC will commence.

Live streaming with WebRTC

Apologies for the limitations of this simple demo. For now, both the caller and callee need to be run on the same laptop as it’s not publicly accessible yet. However, I’m planning to deploy this WebRTC app to Vercel.com soon, so everyone can experience real-world WebRTC scenarios.

Understand the Demo App

Below is typical 10 steps involved in using WebRTC APIs:

  1. Capture a MediaStream from your local devices (e.g., microphone, webcam).
  2. Obtain a URL blob from the MediaStream.
  3. Use the URL blob to preview the local media.
  4. Create an RTCPeerConnection object.
  5. Add the local stream to the newly created connection.
  6. Send your own session description to the remote peer.
  7. Receive the remote session description from the peer.
  8. Process the received session description and add the remote stream to your RTCPeerConnection.
  9. Obtain a URL blob from the remote stream.
  10. Use the URL blob to play the remote peer’s audio and/or video.

With a comprehensive end-to-end diagram, you can gain a complete understanding of the entire process of this WebRTC app.

Diagram of WebRTC demo app

Implementing Data Channels

Data Channels in WebRTC are a feature that allows bidirectional, low-latency communication of arbitrary data between two peers in a peer-to-peer connection. Unlike media streams (used for audio and video), data channels provide a way to exchange non-media data directly between browsers, making them suitable for various real-time applications.

Implementing Data Channels involves creating a Data Channel within the RTCPeerConnection and handling its state and message events to exchange data between peers. The Data Channel API provides methods like send() to send data and events like onmessage, onopen, onclose, and onerror for handling communication events.

Enabling data exchange between peers

const createDataChannel = () => {
try {
console.log('localPeerConnection.createDataChannel invoked');
sendChannel = localPeerConnection.createDataChannel('sendDataChannel', {reliable: true});
} catch (error) {
console.error('localPeerConnection.createDataChannel failed', error);
}

sendChannel.onopen = handleSendChannelStateChange;
sendChannel.onClose = handleSendChannelStateChange;
localPeerConnection.ondatachannel = gotReceiveChannel;
};
const sendOnClick = () => {
console.log('sendOnClick invoked', sendMessage);
sendChannel.send(sendMessage);
setSendMessage('');
};
const gotReceiveChannel = (event) => {
console.log('gotReceiveChannel invoked');
receiveChannel = event.channel;
receiveChannel.onmessage = handleMessage;
receiveChannel.onopen = handleReceiveChannelStateChange;
receiveChannel.onclose = handleReceiveChannelStateChange;
};
const handleMessage = (event) => {
console.log('handleMessage invoked', event.data);
setReceiveMessage(event.data);
setSendMessage('');
};
const handleSendChannelStateChange = () => {
const readyState = sendChannel.readyState;
console.log('handleSendChannelStateChange invoked', readyState);
if (readyState === 'open') {
setSendButtonDisabled(false);
} else {
setSendButtonDisabled(true);
}
};
const handleReceiveChannelStateChange = () => {
const readyState = receiveChannel.readyState;
console.log('handleReceiveChannelStateChange invoked', readyState);
};
}

Sending and receiving non-media data.

We have successfully implemented the peer-to-peer data channel using WebRTC. To see it in action, let’s start the web app by running npm start and open two web pages. On the caller's page, click the Call button to initiate the peer connection.

Once connected, enter Hello, World!!! in the caller's text area and click the Send button.

Send message via Data Channel

You will witness this message being received in real-time on the other side, showcasing the seamless data transfer capability of WebRTC.

Receive message from Data Channel

Advanced WebRTC Features

Managing audio and video codecs for optimal performance

Choose codecs that strike a balance between quality and bandwidth consumption. WebRTC supports various codecs, such as VP8, VP9, H.264 for video, and Opus, G.711, G.722 for audio. Consider the target devices and network conditions when selecting codecs. For instance, VP8 is widely supported and offers good quality, while H.264 may be preferable for hardware-accelerated decoding on certain devices.

Securing WebRTC connections using encryption

WebRTC uses the Datagram Transport Layer Security (DTLS) protocol to encrypt media streams. DTLS provides secure encryption for UDP data transport. When establishing a peer connection, WebRTC uses DTLS to negotiate and exchange encryption keys for encrypting media streams.

If you are using WebRTC data channels for exchanging non-media data, enable encryption for data channel messages. Data channels use SCTP (Stream Control Transmission Protocol) over DTLS for secure data transmission.

Ensure that your signaling server and media server (if used) support secure transport protocols such as HTTPS and WSS (WebSocket Secure). HTTPS provides a secure channel for exchanging signaling data, while WSS ensures secure communication for WebSocket connections used in WebRTC.

Implementing screen sharing functionality

Use the getDisplayMedia or navigator.mediaDevices.getDisplayMedia API to capture the user's screen. This API allows users to grant permission to share their screen with the application. Be sure to handle cases where the user denies or does not have the required permissions.

Using WebRTC with WebAssembly for performance optimization

WebAssembly provides a way to execute code written in languages like C, C++, and Rust directly in the browser, alongside JavaScript. With near-native performance, it unlocks the potential for computationally-intensive tasks, such as video processing, image recognition, and encryption/decryption, to be executed with greater efficiency.

By offloading performance-critical tasks to WebAssembly modules, developers can optimize their WebRTC applications in various ways:

  • Video and Audio Processing: WebAssembly can handle video and audio encoding/decoding, filtering, and analysis, reducing the burden on the main JavaScript thread and improving overall performance.
  • Encryption and Decryption: WebRTC data channels often require secure communication. Utilizing WebAssembly for encryption/decryption tasks can speed up the process and ensure data security.
  • AI and Machine Learning: Complex AI algorithms and machine learning models can be executed via WebAssembly, enabling real-time processing of large datasets within WebRTC applications.

The End

Congratulations! You’ve now learned the fundamentals of WebRTC and built a basic video call application. This is just the beginning of your WebRTC journey. With WebRTC’s immense potential, you can explore various applications, from video conferencing to online gaming and beyond. Keep experimenting, honing your skills, and stay curious about the ever-evolving world of WebRTC.

Remember, real-time communication is at your fingertips, so embrace this powerful technology and take your web applications to the next level. Happy coding!

Example Code

Explore the complete implementation code on our GitHub repository to delve deeper into WebRTC with React and Node.js.

More Reading

--

--