Launching Your Own Zoom, Microsoft Teams, or Google Meet

Kumar Vidit
Geek Culture
Published in
7 min readMay 16, 2021

Social distancing, work from home, and other measures taken during the pandemic have skyrocketed the demand for video conferencing apps. Thousands of such apps exist on Apple App Store and Google Play Store. The most popular platforms are Zoom, Cisco Webex, Meet, and Microsoft Teams. If you are looking to create your own video conferencing platform, then you have come to the right place!

Introducing the tech behind: Say hello to WebRTC!

WebRTC technology enables p2p communication between web browsers and allows one to exchange different types of media, such as video, audio, and data, through the web.

Three core pillars of WebRTC

A. Signaling

Traditionally, WebSockets could be employed to connect two clients but a server would be required to route their messages, as shown below:

Peer to Peer communication via server (Source: Image by author)

WebRTC requires a server only to establish clients’ connections — a process called Signalling. The browsers collect the required information or metadata of the peers and then directly communicate with each other, as shown below:

Direct Peer-Peer communication (Source: Image by author)

WebRTC provides flexibility in implementing the signaling messaging protocol. WebSocket, SIP, and XMP are some of the common approaches to do so.

The signaling process is detailed below:

1. A client initiates the call.

2. The caller creates an offer using the Session Description Protocol (SDP) and sends it to the receiver.

3. The receiver responds to the offer with an answer message containing an SDP description.

4. Once both peers have set their local and remote session descriptions, including browser’s codecs and metadata, they know the media capabilities used for the call.

However, they can’t connect and exchange their media data because SDPs are not aware of external Network Address Translators, IP addresses, and how to handle port restrictions. Here, Interactive Connectivity Establishment (ICE) comes into the picture.

B. Interactive Connectivity Establishment (ICE)

ICE is a p2p network communication method to exchange information on network connections. ICE gathers ICE candidates, which are the IP and port pairs that one browser can attempt to use to connect to another. and uses protocols: Session Traversal Utilities for NAT (STUN) and Traversal Using Relays around NAT (TURN).

Flow Diagram

1. For building a list of ICE candidates, the calling browser makes a series of requests to a STUN server.

2. STUN server responds and returns the public IP address and port pairs

3. The calling browser creates a list of ICE candidates by adding all pairs to it.

4. Once the browser has finished gathering ICE candidates, passes the ICE candidates to the receiver’s browser through the signaling channel.

5. Now, the receiver’s browser needs to generate an answer. It follows the same steps as above: gathering ICE candidates, etc., and sends ICE candidates to the calling browser.

6. Once exchanged, a series of connectivity checks are performed.

7. The ICE algorithm in each browser takes a candidate pair from the list that it received in the other browser’s answer and sends it a STUN request.

A. If a response comes, the originating browser considers the verification successful and will consider that IP/port pair as a valid ICE candidate.

  • Post completing checks on all of the pairs, the browsers negotiate and decide to use one of the remaining, valid pairs.
  • Media starts flowing between the peers, once a pair is selected.
Network architecture with only STUN involved (Source: https://temasys.io/))

B. If the browsers can’t find a pair that passes connectivity checks, they’ll send STUN requests to the TURN to get a media relay address.

  • A relay address is a public IP address and port that will forward packets received to and from the browser the setup the relay address.
  • This relay address is then added to the candidate list and exchanged via the signaling channel.
Network architecture with TURN and STUN involved (Source: https://temasys.io/)

The WebRTC stack includes an ICE Agent that takes care of most of the above. We just need to implement a signaling mechanism to exchange SDPs and send along with new ICE candidates whenever they’re discovered.

C. APIs

WebRTC relies on three main JavaScript APIs:

MediaStream: represents a device’s media stream that can include audio and video.

RTCPeerConnection: it allows communication between peers.

RTCDataChannel: it enables real-time communication of arbitrary data.

Let’s implement a simple video chat

Server

Create a server.js file that will run the application on port 3000 and handle the WebSockets messages that will be used for signaling.

Code: Creating server.js to run on port 3000

Import the SocketIO library and handle the messages, emitted by the clients:

Code: Handling messages emitted by clients

Client communications

Add the required functionalities for the application to work to a client.js file.

First, clients joining a room (or create it if nobody has):

Code: Enabling clients to join a room

Note that navigator.mediaDevices.getUserMedia method is called to get the clients’ media data. If a client joins an existing room (created by another client), the media exchange will start and be managed as follows:

Code: Managing media exchanges

Both clients are now connected. They will be able to hear and see each other.

Must have features for your product

Features for MYP (Source: Image by author)

A. Registration

  • Registered users are more likely to use the platform repeatedly. Enable logging in using third parties using Facebook Login and Google Sign-In.
  • Registration can be made optional. For example, Zoom doesn’t mandate users to register. Users can join a conference call via a link shared by another user.

B. Profile Management

Registered users need to manage their personal data, including e-mails, passwords, names, and other details.

C. Contact List

  • Users should be able to find other users on the platform via nickname or real name.
  • Consider integrating with the user’s phonebook. Google contacts API is a convenient method to do that. But, ensure taking permission from the user to access the phonebook.

D. Video and Voice Calls

  • Voice calls are a chief component of a video conferencing app. Making calls across the globe is expensive. In-app voice calls are a cheaper alternative and will enable users to stay connected aboard.
  • Ensure the video is transmitted in at least HD quality by optimizing the real-time connection as best as possible.

E. Group Calls

  • Users love organizing group conferences. So, give that to them. Define a threshold number of users that can join a single call.
  • Develop features for conference hosts, such as muting/unmuting users, inviting, and banning.

F. Text Chat

Sometimes, the user is in an area with a poor network or is in a meeting. Then, text chat becomes more convenient than voice or video calls.

G. End-to-End Encryption

  • Ensure conversations are encrypted — the message is encrypted on the sender’s device and decrypted only on the recipient’s device so that someone else cannot read the message.
  • Protocols such as AES-256 and HMAC-SHA256 will make a video conferencing platform secure.

Advanced features to consider

Advance features for implementation (Source: Image by author)

A. Screen Sharing

Useful for tutorials, giving a presentation, providing prescriptions, and much more. It can be implemented via WebRTC.

B. Virtual Background

A fun but practical feature. Helps users to hide their messy room or to appear to take the call from the Eiffel tower!

C. Noise Cancellation

Who doesn’t hate the background noise during an important video conference — be it of loud TV or the dog barking in the background. Deep learning algorithms can isolate a user’s voice from the background, and suppress the background sounds.

D. Custom Emojis

Personalizing the experience is trending. Everyone has “inside” jokes, and users want to use them in chats. A custom sticker/emoji pack will provide a boost to the user experience.

E. Custom Masks

AR effects add individuality. Snapchat first launched Masks and they soon became popular in any app that utilizes a front camera.

Strategies to make money

Source: Google Image

Having built your platform, now, you are looking to moneitize your product. Following are worth considering:

A. Advertisements

In the case of video conferencing, ads cannot be integrated during the call. However, it is possible to display promotional banners. Unskippable ads can be shown at the end of the call — which could annoy the users though.

B. Paid Calls

International calls may be made paid. For example, Skype provides country-specific call rates that are cheaper than roaming costs. Cheaper international calls will attract users.

C. Freemium

Basic features such as video calls must be free. However, limitations can be set on-call time or the number of participants in the call. Zoom, for example, offers free conferences up to 40 minutes, with 100 attendees. If users require bigger and longer conferences, different plans are available on Zoom.

D. Paid Stickers

It will not only generate revenue but also entertain users. Everyone loves funny and creative stickers. Users will be happy to pay a reasonable amount for quality stickers.

Hope you found the article insightful. Thank you for reading.

--

--

Kumar Vidit
Geek Culture

I am passionate about solving complex problems by combining technology and business requirements, creating impact in the process.