Zoom — Video conf app at scale

3 min readMay 18, 2020

Zoom scaled from 20 million to 300 million users virtually overnight.

This kind of scale is possible because Zoom sees its architecture as a competitive advantage.

Everyone will be using video, so how do we scale to everyone?

Zoom started with the goal of video everywhere and this goal has shaped their architecture.

Bitrate Encoding

Zoom figured out the issues with traditional Video conferences of being jittery over bad network environments and required a lot of CPU intensive resources. So Zoom chose to use H.264 Annex G(SVC) codec over H.264 (MPEG-4 Part 10).

In simple terms SVC (Scalable Video Codec) codec over Advanced Video Coding(AVC).

AVC is a codec where you send a single stream and the single-stream has a single bitrate, meaning it can be a lower resolution or a higher resolution one. If you want to send multiple bitrates you have to send multiple streams. This increases bandwidth utilization if you want to send multiple bitrates.

Traditionally, AVC was used which used to send traffic through a datacenter, trans-code it into a normal view for everybody else, and then send mixed video out to every individual participant. That introduces latency, uses a lot of CPU resources, and it’s hard to scale and deploy new datacenters to meet the increased load.

SVC is a single stream with multiple layers. That allows sending a 1.2 mbs stream that has every resolution and bitrate you may need to scale down to given network conditions. SVC codec starts with a base layer that represents the minimum video quality and then there are subsequent layers on top where each layer add to the improvement in the video quality. In order to reconstruct the higher quality stream, first base layer is decoded and the subsequent enhancement layers are decoded to produce a video stream with certain desired characteristics. SVC codec was first released in the year 2007, during those days SVC support was only possible in ASIC. Now with faster CPU, SVC can be done in software today we have a lot faster CPUs.

Multimedia Router

Traditional systems used Multiplepoint Control Unit(MCU), in which the bitrate is selected before delivering to the device/user, this required trans-coding of the bitstreams which is CPU intensive and limits the quality and scalability of the systems.

Zoom developed a multimedia router optimized for the cloud that separates content processing from the transporting and mixing of streams. This new design got rid of the traditional constraints of trans-coding which causes latency and limits the scale.

Application Layer QoS

Zoom developed application layer QoS (Quality of Service) that works between the cloud and the Zoom client. The algorithm constantly monitors the network condition by gathering telemetry data (CPU, Jitter Packet Loss, etc) and switches the stream to best adapt to the immediate parameters.

The adaption can be initiated from the cloud side when cloud does not receive certain packets, so it makes the decision to switch to a different downstream to the client.

The adaption can be initiated from the client side. When client detect a bad network environment , the client can automatically downsize their own upstream video, so you’re not killing your own downstream bandwidth.

Based on the Telemetry information the stream can toggle between HTTPS, HTTP & UDP transport protocols.

Sources:

Here’s How Zoom Provides Industry-Leading Video Capacity

https://blog.zoom.us/wordpress/2019/06/26/zoom-can-provide-increase-industry-leading-video-capacity/

Most of Zoom runs on AWS, not Oracle — says AWS

https://www.datacenterdynamics.com/en/news/most-zoom-runs-aws-not-oracle-says-aws/

How Zoom’s Unique Architecture Powers Your Video First UC Future

https://www.youtube.com/watch?v=5BMbsFqtD0A

Scalable Video Coding

https://en.wikipedia.org/wiki/Scalable_Video_Coding

Advanced Video Coding

https://en.wikipedia.org/wiki/Advanced_Video_Coding

Zoom — Video conf app at scale

Written by Vikram Sachdeva