The Challenges of Live Linear Video Ingest — Part Two: System Design and Implementation

by Allison Deal, Senior Software Developer

In Part One of this series, we discussed the engineering and design requirements for our video ingestion system for live TV. To recap, our existing video pipeline couldn’t support live video ingestion so when we built our new system for live video ingest, we designed it to be flexible, reliable, and perform well for our viewers.

How it Works

Hulu partners with vendors who provide us with HLS feeds of network streams that are already segmented and transcoded into multiple variant streams, each consisting of a different video or audio quality representing the same content.

These segmented streams are delivered to us via an HLS contribution model, a protocol that all of our vendors are able to support.

Master Playlist

The vendor first initializes a channel with a master playlist, which describes the various media playlists that will follow; there will be one media playlist per variant.

Master Playlist

Each variant contains the same content, but is a different quality of video and/or audio, allowing players to choose which combination is best to serve based on the client’s capabilities and connection speed. The number of media playlists listed varies by network and vendor and ranges between four and eight.

A rendition represents many small audio/video files, each about four seconds in length. Consecutive playback of these four-second segments creates a continuous video stream.

Media Segments

After sending the master playlist, vendors post unencrypted, muxed MPEG-TS files containing H.264 video and compressed audio to Hulu’s ingest service.

Upon the Hulu ingest service receiving each MPEG-TS segment, the file is:

  1. Parsed for video metadata, which is temporarily stored in Amazon ElastiCache for Redis. This metadata will later be permanently stored and used for the video to be played back to the user.
  2. Stored in Amazon S3. These original unencrypted files are not served to users, but kept temporarily for debugging purposes.
  3. Used to generate an AES-CTR encrypted fMP4 copy of the file with PlayReady/Widevine DRM applied. The resulting audio and video fMP4 and init files are stored in a temporary location in S3.
  4. Used to generate an AES-CBC encrypted MPEG-TS copy of the file with FairPlay DRM applied, which is also stored in a temporary location in S3.

Processing each media file as soon as it’s received, before we even know which channel or rendition it belongs to, allows Hulu to serve video to users with minimal latency.

Media Playlist

Following the media segment, the HLS media playlist (previously defined in the master playlist) is received, which lists recent video segments already posted to us for the given rendition. Our system now has information about each of the video segments it’s received: the channel and rendition that each previously processed media file is associated with, and the order in which each should be played back in the stream.

The media playlists use a rolling window to keep only the most recent segments in context. Here, VIDEO_1_A.ts rolls off the top of the playlist when VIDEO_1_E.ts becomes available.

The ingest service associates all of these individual segments it has previously received with the files listed in the media playlist, and once the segments are confirmed as received, they are moved from temporary to a permanent storage location in S3, which is used as a distribution origin. All master and media playlist locations, media segment metadata, channel configurations, and SCTE-35 messages are permanently stored in an Amazon Aurora MySQL cluster.

The HLS media playlists also contain SCTE-35 ad and program messaging, which are presented in the #EXT-X-DATERANGE tag of the HLS Media Playlist. These messages arrive either base64 or hex encoded, are parsed, stored for later access during manifest generation, and shared with Hulu metadata services for determining program extensions.

Implementation

Hulu’s ingest service API layer is written in Go. Video operations, including video remux, SCTE-35 event message parsing, and adding Nielsen watermark ID3 tags, written in C, which are facilitated using cgo. We chose to use this combination of Go and C for development simplicity and performance purposes. This Go app runs in AWS on Donki, Hulu’s internal platform for hosting web applications. With Donki, we can easily scale and add additional Amazon EC2 backends as necessary when new channels are added to Hulu’s live TV offering. Each of these servers is capable of processing playlists and video files for any channel, which simplifies scaling and failover.

When designing our system, one main concern was the size and growth rate of the permanent datastore that contains all of playlist and segment information. For each channel, new media playlists and segment metadata for each rendition need to be added approximately every four seconds, since the length of each video segment determines the rate at which incoming segments are received. Our design structures this information so that each channel is allotted its own database within an Amazon Aurora MySQL cluster.

Doing this is feasible because each channel is processed independently without relying on metadata of other channels. We also find it optimal to use separate databases to safeguard against a widespread outage of multiple channels in the case of database failure. The ingest application EC2 backends and MySQL cluster are both spread across multiple Availability Zones to ensure resources are available at all times. Each AWS Region is architected with multiple Availability Zones to protect data durability and service availability.

To make the service more fault-tolerant, the system utilizes configuration toggles that allow it to ignore video and/or metadata at the channel level. If one channel contains video or metadata that is increasing latency of the system, it can be immediately ignored to prevent ingest delay on other channels. Other vendor- and channel-specific configurations around segment publishing timeouts, video metadata precision, and segment synchronization across renditions allow the system to optimize processes at a more granular level.

For even further availability, our system consumes multiple different vendor sources for each channel, so that a backup stream can be served to users in the case of a primary stream outage.

Conclusion

Our live media ingest service is designed to be highly available, built to reduce latency, and provides the best possible experience for our viewers, while also ensuring we can scale and add new features for the future. However, our system is subject to inconsistent inputs and conditions, and in Part 3 of this blog post, we’ll discuss the main challenges we faced while developing a live video ingest service and solutions for each.

Attending Grace Hopper this year? Come say hello and join me for an in-depth talk about our live linear video ingest system on Thursday, 27th at 1PM!

Allison Deal is a senior software developer at Hulu, specializing in video encoding and streaming technologies. She works on building and scaling the end-to-end live and on-demand video pipelines, with the ultimate goal of improving the playback experience for all viewers. She has been at Hulu for over three years, with prior stints at Rdio and Boeing, where she worked in Research and Development.

If you’re interested in working on projects like these and powering play at Hulu, see our current job openings here.