Video@Scale: Instagram Live

Ning Zhang
4 min readMar 31, 2018

--

This blog series describe the work of the Video Infrastructure team at Instagram during 2017. This post talks about how we built Instagram Live to be the largest live video service in the world in its first year. Disclaimer: This is a personal blog. I do not speak for Facebook, its subsidiaries, or associates.

2017 has been a great year for Instagram Live. We launched Live worldwide in January, it was an instant hit! Throughout the year, we improved Live performance, added features (like Face Filters and Share to Stories). In October we launched LiveWith, aka go live with a friend. Below screenshots showcase some of the key features: LiveWith, Face Filter, and PostLive in Stories.

By the year end, for all we know, Instagram Live has become the largest live video service in the world in terms of numbers of active users and video uploaded and watched daily. We are very excited and proud of how Instagram Live is being used: celebrities go live with fans, businesses connect with customers, users communicate with each other through sign languages, and many live broadcast their parties, games, weddings, or just hang out having fun together using Live. Instagram Live democratizes media, gives people yet another rich tool to express themselves, connect with others, have fun, and build communities.

The key to Instagram Live’s success are features and performance. New Face Filters are frequently added for people to have more fun, Live videos can be published to Stories so these who missed the live broadcast can watch them within 24 hours, and LiveWith adds a whole new level of interactivity.

For performance, the challenge is to enable high quality, low latency, smooth playback at massive scale. These goals are in conflict with each other. Below is a simplified conceptual diagram of Live infrastructure:

A Live broadcaster streams live video via RTMP to Facebook Live Server (FBLS), which then uses layers of cache and PoP’s for scale and performance. Here we prioritize smooth playback (measured in stall rate) over latency, adjust video quality dynamically, drop frames when necessary. The upload buffer on broadcaster side, the transcoding on FBLS, the layers of cache, and the playback buffer on viewer side all add seconds to latency, so the overall latency can get quite big. Here is a band’s creative use of Facebook Live latency :-) This is not what we wanted. So throughout the year we worked hard on improving Live performance (video quality, playback smoothness and end to end latency). Below are some highlights:

  • Right after Instagram Live launch, we noticed, and received user feedback as well, that some Live broadcasts had low video quality, especially on iOS. The Live team was busy developing LiveWith at the time, so I asked our Android lead Zen of the Playback team to investigate. Despite lack of good logging and analytics back then, he had a good hunch: both empirical and aggregate data showed we couldn’t just assume bad network, maybe the bandwidth estimator under-estimated network capability? He quickly found and fixed the bug that the bandwidth estimation only went down under certain conditions. That fix alone halved the percentage of low quality Live videos on iOS.
  • We noticed high stall rate on iOS after PostLive launch. Our iOS lead Jaed and engineer Arvind refactored the playback stack, implemented “sparse cache” to significantly improve fetching and seeking performance. That fixed the stall issue.
  • Improving Live latency without sacrificing video quality, smooth playback, and scalability is a hard problem. Many teams across Facebook have been working on it, as are other companies in the industry. We analyzed and monitored the end to end latency across platforms, optimized each step, from broadcaster, FBLS, CDN, to viewer, and kept reducing latency. Here are some of the key effort:
  1. Optimize fetching, prefetching, and playback buffer management on viewer side
  2. Use CDN priming to proactively push video segments through origin and edge cache before they are requested
  3. Optimize FBLS to stream video segments as they are transcoded
  4. Use RTC instead of RTMP, tweak streaming protocols, and invent new ones

LiveWith has much tougher latency requirement than Live, so we ended up implementing a hybrid model: when a broadcaster invites a guest to go live together, we switch both to RTC, and use a Composition Server to generate the RTMP stream for viewers. More details at Nick Ruff’s talk at Streaming Media West 2017. Our Android engineer Sergey, iOS engineer Jaed, and server engineer Hai led the development and launch of Instagram LiveWith.

All posts of this Video@Scale series:

--

--