Enhancing AV1 playback with Elevator

Raphaël Zumer
Dec 6, 2019 · 6 min read
Image for post
Image for post

As early adopters of the AV1 coding standard, we at Vimeo have needed to develop solutions for problems as they crop up to deliver the best possible viewing experience. One of these solutions is Elevator, a tool that sets the lowest possible correct level parameter in the bitstream of an encoded AV1 file. This reduces the likelihood of the frame drops and quality degradation that occur when decoders aren’t able to make the right decisions when playing back AV1 content, which ensures that as many devices as possible can process our streams, now and in the future.

Elevator was released in September 2019 and was first presented to the public at Video Dev Days 2019 in Tokyo, Japan. In this post, we provide some technical background on the AV1 format, discuss how Elevator does what it does, look at some use cases, and reflect on lessons learned during its implementation.

Understanding AV1 levels

In AV1 and other coding standards, the level parameter is coded early on in the encoding process. For encoders, this means that there are three strategies available to set the level accurately:

  • Always set the level to its maximum possible value (with unconstrained rate parameters)
  • Gather data while encoding, and return to the header to code the correct level at the end (which isn’t possible for partial encoding, such as in a live setting)
  • Pick a level in advance, and enforce its rate constraints during encoding

Encoders may also attempt to estimate the level of a stream in advance with varying success. However, due to unpredictable rate spikes, it’s impossible to produce valid streams every time using this method. For example, Elevator calculates a higher level for the Chimera AV1 sample video than what was likely set by the libaom reference encoder at encode time:

rzumer@gekai:~/dev/vimeo/elevator$ cargo run Chimera-AV1–10bit-4096x2160–29504kbps.ivf -v
Compiling elevator v1.0.0 (/home/rzumer/dev/vimeo/elevator)
Finished dev [unoptimized + debuginfo] target(s) in 1.33s
Running `target/debug/elevator Chimera-AV1–10bit-4096x2160–29504kbps.ivf -v`
Container metadata:
Time scale: 23.976 (24000/1001)
Resolution: 4096x2160

Number of displayed frames: 8929
Maximum header, display, and decode rates in a single temporal unit: 28.971, 23.976, 119.880
Minimum level required to satisfy compressed ratio constraint: 2.0 (0)
Maximum bitrate: 83.608 Mbps
Maximum number of tiles and tile columns found: 1, 1

Sequence context:
Tier: Main
Picture Size: 4096x2160
Display/Decode/Header Rates: 212124516/1060622578/29
Mbps: 83.608
Tiles/Tile Columns: 1/1

Level: 5.0 (12) -> 6.1 (17)

The encoder we use at Vimeo to produce AV1 content, rav1e, uses the first method of always setting the level to its maximum possible value. While this is the simplest one of the three, it’s not ideal for future-proofed decoder support, since we want to ensure that our videos that are decodable by low-power devices are correctly detected as such. This is why we developed Elevator separately from rav1e to analyze videos and set the right level parameter post-encode.

Elevator on the job

While we use Elevator only to lower the coded levels of our content at Vimeo, Elevator can also be used more actively to bound the level of encoded videos when the encoder doesn’t support it well, for example by re-encoding content with progressively stricter parameters until the level reported by Elevator is sufficiently low.

Elevator is written in Rust to leverage av1parser, which at the time of development was, to our knowledge, the only standalone parser available for AV1 streams. Because some of the various pieces of information needed to compute the level parameters are embedded deep in the midstream headers, implementing our own parsing logic would have been difficult and time-consuming. With just a few changes merged into av1parser, we were able to offload that work to it. The only bit counting and manipulation needed in Elevator is the logic for setting the corrected level at the end of the process. This saves us having to write hundreds to thousands of lines of parsing code, due to how complex AV1 streams can be.

Despite that, Elevator doesn’t yet support all AV1 videos. Some bitstream features modify the rate calculations in ways that we don’t yet support. We also currently only support parsing files using the IVF container. We encourage anyone interested in the project to contribute bug reports and feature suggestions as issues and patches as pull requests on the project’s GitHub page.

A closer look

The following activity diagram summarizes the analysis loop:

Image for post
Image for post

After processing container data, Elevator enters its main analysis loop: each OBU in the video is parsed, and the relevant data, such as frame sizes, the number of headers, and so on, is persisted and then collected between each frame when a temporal delimiter is reached. (As you might already know, the OBU or Open Bitstream Unit is the smallest division of data in the AV1 format.)

The level computed at the end of the analysis phase is based on the maximum rates encountered during the main loop, updated every frame. Some of the rates are defined per frame, but others are per second. In this case, the AV1 specification doesn’t define how the rate is to be calculated. Some possible implementations evaluated during development are:

  • Counting per-frame metrics and scaling the result to one second
  • Counting per-second metrics, rounded down to the closest number of frames
  • Counting per-second metrics, rounded up to the closest number of frames

We found that calculating rates every frame and scaling the result resulted in rate spikes that increased the computed level to unrealistic results, so we opted to update the rate every frame but compute it using data from a number of frames that make up one second, rounded up: for example, we use a 24-frame window for a video that runs at 23.976 fps. If the frame rate isn’t an integer value, we scale the resulting rate down so as not to overestimate the result. In the case of 23.976 fps, we multiply the result by 23.976 ÷ 24.

In our testing, we found that some AV1 videos in the wild (such as the Chimera example mentioned above) have lower coded level values than those calculated by Elevator. If Elevator’s analysis is correct, this means that there may be a large number of non-compliant AV1 streams published on other services. Unfortunately, there are no public test sequences available to validate AV1 levels. We hope that, as more hardware decoders are released on the market, interest grows for correctly setting the level parameter, leading to more implementations that can be validated against one another.

Vimeo Engineering Blog

We tinker, we build, and we dream up all-new things to help…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store