As early adopters of the AV1 coding standard, we at Vimeo have needed to develop solutions for problems as they crop up to deliver the best possible viewing experience. One of these solutions is Elevator, a tool that sets the lowest possible correct level parameter in the bitstream of an encoded AV1 file. This reduces the likelihood of the frame drops and quality degradation that occur when decoders aren’t able to make the right decisions when playing back AV1 content, which ensures that as many devices as possible can process our streams, now and in the future.
Elevator was released in September 2019 and was first presented to the public at Video Dev Days 2019 in Tokyo, Japan. In this post, we provide some technical background on the AV1 format, discuss how Elevator does what it does, look at some use cases, and reflect on lessons learned during its implementation.
Understanding AV1 levels
A level in an AV1 file is a set of rate limits for a video stream. These can include bit rate, frame rate, header rate, and so on, and they are usually computed per frame or per second. This concept has been around since MPEG-2, and it exists in every modern, widely used video codec. A decoder complying to a given level must be able to decode any stream that complies to that level. Because hardware video decoders on low-power devices have limited decoding capabilities, levels enable these devices to provide performance guarantees and determine in advance whether they are capable of decoding a stream without a degraded user experience.
In AV1 and other coding standards, the level parameter is coded early on in the encoding process. For encoders, this means that there are three strategies available to set the level accurately:
- Always set the level to its maximum possible value (with unconstrained rate parameters)
- Gather data while encoding, and return to the header to code the correct level at the end (which isn’t possible for partial encoding, such as in a live setting)
- Pick a level in advance, and enforce its rate constraints during encoding
Encoders may also attempt to estimate the level of a stream in advance with varying success. However, due to unpredictable rate spikes, it’s impossible to produce valid streams every time using this method. For example, Elevator calculates a higher level for the Chimera AV1 sample video than what was likely set by the libaom reference encoder at encode time:
rzumer@gekai:~/dev/vimeo/elevator$ cargo run Chimera-AV1–10bit-4096x2160–29504kbps.ivf -v
Compiling elevator v1.0.0 (/home/rzumer/dev/vimeo/elevator)
Finished dev [unoptimized + debuginfo] target(s) in 1.33s
Running `target/debug/elevator Chimera-AV1–10bit-4096x2160–29504kbps.ivf -v`
Time scale: 23.976 (24000/1001)
Number of displayed frames: 8929
Maximum header, display, and decode rates in a single temporal unit: 28.971, 23.976, 119.880
Minimum level required to satisfy compressed ratio constraint: 2.0 (0)
Maximum bitrate: 83.608 Mbps
Maximum number of tiles and tile columns found: 1, 1
Picture Size: 4096x2160
Display/Decode/Header Rates: 212124516/1060622578/29
Tiles/Tile Columns: 1/1
Level: 5.0 (12) -> 6.1 (17)
The encoder we use at Vimeo to produce AV1 content, rav1e, uses the first method of always setting the level to its maximum possible value. While this is the simplest one of the three, it’s not ideal for future-proofed decoder support, since we want to ensure that our videos that are decodable by low-power devices are correctly detected as such. This is why we developed Elevator separately from rav1e to analyze videos and set the right level parameter post-encode.
Elevator on the job
Elevator is an open-source command-line application used to calculate the parameters associated with level constraints in an AV1 video, calculate the lowest valid level that the stream complies to, display this value, and (optionally) set it directly in place or to a new file.
While we use Elevator only to lower the coded levels of our content at Vimeo, Elevator can also be used more actively to bound the level of encoded videos when the encoder doesn’t support it well, for example by re-encoding content with progressively stricter parameters until the level reported by Elevator is sufficiently low.
Elevator is written in Rust to leverage av1parser, which at the time of development was, to our knowledge, the only standalone parser available for AV1 streams. Because some of the various pieces of information needed to compute the level parameters are embedded deep in the midstream headers, implementing our own parsing logic would have been difficult and time-consuming. With just a few changes merged into av1parser, we were able to offload that work to it. The only bit counting and manipulation needed in Elevator is the logic for setting the corrected level at the end of the process. This saves us having to write hundreds to thousands of lines of parsing code, due to how complex AV1 streams can be.
Despite that, Elevator doesn’t yet support all AV1 videos. Some bitstream features modify the rate calculations in ways that we don’t yet support. We also currently only support parsing files using the IVF container. We encourage anyone interested in the project to contribute bug reports and feature suggestions as issues and patches as pull requests on the project’s GitHub page.
A closer look
Elevator can be summarized in two high-level steps: analysis and patching (the latter being optional). The patching step is relatively simple, since the level parameter is located early on in the sequence header of the AV1 stream. During the analysis phase, we save the location of the sequence header when it is reached to save ourselves the trouble of finding it again later on. We then calculate the number of bits between the start of the sequence header and the level to edit the bitstream at the right place.
The following activity diagram summarizes the analysis loop:
After processing container data, Elevator enters its main analysis loop: each OBU in the video is parsed, and the relevant data, such as frame sizes, the number of headers, and so on, is persisted and then collected between each frame when a temporal delimiter is reached. (As you might already know, the OBU or Open Bitstream Unit is the smallest division of data in the AV1 format.)
The level computed at the end of the analysis phase is based on the maximum rates encountered during the main loop, updated every frame. Some of the rates are defined per frame, but others are per second. In this case, the AV1 specification doesn’t define how the rate is to be calculated. Some possible implementations evaluated during development are:
- Counting per-frame metrics and scaling the result to one second
- Counting per-second metrics, rounded down to the closest number of frames
- Counting per-second metrics, rounded up to the closest number of frames
We found that calculating rates every frame and scaling the result resulted in rate spikes that increased the computed level to unrealistic results, so we opted to update the rate every frame but compute it using data from a number of frames that make up one second, rounded up: for example, we use a 24-frame window for a video that runs at 23.976 fps. If the frame rate isn’t an integer value, we scale the resulting rate down so as not to overestimate the result. In the case of 23.976 fps, we multiply the result by 23.976 ÷ 24.
In our testing, we found that some AV1 videos in the wild (such as the Chimera example mentioned above) have lower coded level values than those calculated by Elevator. If Elevator’s analysis is correct, this means that there may be a large number of non-compliant AV1 streams published on other services. Unfortunately, there are no public test sequences available to validate AV1 levels. We hope that, as more hardware decoders are released on the market, interest grows for correctly setting the level parameter, leading to more implementations that can be validated against one another.