Video processing at Upthere

Published in

what’s up

7 min readSep 27, 2016

The mission of Upthere is to care for humankind’s information. To us, this not only means providing safe and secure storage in the cloud, but also making data consumption a delightful experience, regardless of your device’s OS, manufacturer, or network connectivity. Video constitutes a large percentage of uploaded data, and our goal is to enable our customers to view and share videos in a seamless way, from any device. In this post we share our thought process on implementing video support, discuss some of the challenges we came across, and finally describe the solutions we found to address them.

Cross-platform incompatibility

If you use an Android phone and your friend is an Apple die-hard, you know these issues all too well — videos recorded on one platform don’t necessarily play on another. Similarly, if you take a video on your digital camera, it’s not a given that your grandma will be able to view it without going through cumbersome conversion steps. The incompatibilities become even more apparent with videos created on older devices. When it comes to codecs used to encode videos, universal support is nonexistent, so an intermediate conversion step is required. Our goal was to make it fast and completely transparent — sharing a video with anyone should be as easy as uploading the file and clicking Share.

Network bandwidth limitations

Video files are very large, and their size increases with advances in camera technology. A minute of 1080p, 30FPS video recorded on iPhone 6s is roughly 130MB in size. A slo-mo 1080p video at 60FPS video takes about 200MB. Want to impress your friends with 4K footage of your snowboarding tricks ? That will take approximately 375MB per minute. Moving this much data over a cellular connection is impractical, expensive, and in many locations simply impossible due to poor reception. To enable viewing under different conditions, the videos need to be converted to a smaller size, which almost invariably has quality implications. We needed to find a solution that was bandwidth-aware and that would strike a balance between quality and network bandwidth used.

Our approach

Having considered the codecs supported across platforms, we settled on transcoding all incoming videos into H.264 AVC. It was a natural choice for our use case because playback of videos in this format is supported across the widest range of clients (see Table 1). Note that the original video uploaded to Upthere is never altered by our service and the original content is always available to be downloaded.

To support a high-quality playback experience, we chose to stream videos via HTTP Live Streaming (HLS). This protocol was first introduced by Apple in 2009 and has since has become the de-facto streaming standard supported across a wide variety of clients (see Table 2).

With HLS, the client first downloads a manifest file (master playlist) that contains links to multiple video variants of different resolutions. Furthermore, each variant is chunked up into smaller segments to allow the player to dynamically switch between video variants as network conditions change. The switching logic is built into the client and requires no special server support.

To implement our transcoding pipeline, we used the excellent open-source ffmpeg suite that works with a large number of video formats, as well as includes support for generating HLS playlists. With this tool at our disposal, it didn’t take long to build a first working prototype. However, we ran into some interesting challenges that needed to be solved before the new video features could be introduced to our users. Hopefully what we learned can help others who are doing similar work.

Figure 2: Transcoding into multiple resolutions

Decreasing transcoding latency

Upthere backend software consists in part of microservices distributed across many servers. While our architecture is in itself a great topic for a separate article, suffice it to say that data storage and data processing typically happens on different machines. Thus, our first transcoding prototype comprised of 3 separate steps: (1) download the video file from the storage cluster to one of the transcoding servers (2) run ffmpeg to generate streamable video files and (3) upload the resulting files to our storage cluster. This worked, but incurred significant latency. Transcoding only started once the file was downloaded to the local storage, and we had to wait for resulting streams to be uploaded to remote storage before our users could start viewing them. To improve performance, we needed a way for ffmpeg to access our storage cluster directly and randomly seek inside the file (as ffmpeg tends to do before it starts the conversion process).

It turns out that ffmpeg already includes support for various read/write protocols, and file protocol is just the default option. The one that looked most promising was HTTP, so we got to work implementing internal HTTP access on our storage service. Now ffmpeg was issuing byte range GET requests to access the offsets it needed and issuing POST requests to store transcoded results. The latency immediately went down, but surprisingly our metrics still indicated a greater than expected number of bytes being moved across the network. Looking under the hood, we found that ffmpeg was issuing multiple open-ended byte-range requests (e.g. “Range: bytes=0-”) and taking a long time to drop previous connections. Each of the outstanding requests caused our servers to stream back more than ffmpeg was actually using. To optimize this, we customized our http wrapper to limit the number of bytes returned in each GET response, thus bounding the open-ended GET request issue.

Parallel transcoding

The next optimization step was generating multiple streams from the same video file in parallel. Our first working prototype generated each stream separately, and it meant that ffmpeg was performing the same processing on the input file multiple times. While that is indeed the most commonly used approach, parallel processing can be enabled by using the so-called filtergraphs. Filtergraphs can give ffmpeg instructions on how to split input into multiple streams and process each one independently. After numerous tries and a few lost hairs, we had a recipe to transcode all the streams in parallel. To learn more about ffmpeg filters, see the Ffmpeg Filters Documentation page.

Load balancing

As mentioned above, Upthere runs a microservice infrastructure. A document uploaded into our datacenter goes through a pipeline of processing (e.g. metadata extraction, preview generation, etc.), with each step performed by one of the machines assigned to that step. All services communicate with each other using Kafka, an open source messaging system for distributed environments. When a document is ready for the next phase, the associated message gets enqueued to one of the sub-queues (partitions) allowing for work to be distributed evenly to the consuming services. Many producers can send documents to the same partition, but any one partition can only be worked on by one consumer service. This works very well for messages that take a uniform and relatively short time to process because the load is distributed evenly to all the consumers. However, when we deployed the transcoding service as one of the steps in the pipeline, we observed a different pattern. Most of the videos are relatively short in duration (a minute or less), so processing doesn’t take long. But once in a while we saw a user upload a much longer video that could take many minutes to transcode. This effectively resulted in head-of-line blocking, as all videos that ended up in the same partition would be severely affected by the long video in front of them. Even though we had a large number of unused transcoding servers that could do the work, they were idling because they were assigned to different partitions.

To solve this issue we had to implement a distributed queue to effectively utilize our pool of transcoding servers. Now, all videos ready to be transcoded go to a shared queue and get picked up by the next available transcoding service. As our load increases, we are able to quickly deploy more servers to respond to the load. Our implementation of distributed queue uses Zookeeper, another excellent open-source tool.

Future work

We’re happy with our initial set of video features, but we’re just getting started! Some of the future work in this area includes utilizing GPUs to make the processing even faster, using machine learning to improve searches and suggestions, and using the power of the cloud to make editing and sharing videos more fun. If you have a favorite video feature you’d like us to implement, we would love to hear from you! Email hi@upthere.com.

Yuri Zats
Video Team Lead

Upthere Home is available for Android, iPhone, Mac, and Windows.