AV1 is ready for prime time Part 2: Decoding performance

Ewout ter Hoeven
4 min readOct 10, 2019

--

Last Wednesday I wrote about how the SVT-AV1 encoder developed by Intel and Netflix is maturing quickly. Today we’re going to take a look at AV1 decoding performance, the other side of the equation, at both the x86 and Arm platforms.

x86

On the PC side of things AV1 decoding is already very mature. The past months a lot of SSSE3 assembly was added to dav1d increasing it’s performance on older CPUs (pre-Haswell and pre-Zen).

AVX2 performance also increased a little with dav1d 0.5.0, that was already very fast.

There’s a new decoder in town, libgav1, created by Google. They say it’s mainly optimized for Arm CPUs. A quick test on x86 shows that even single-threaded performance is worse than both dav1d and libaom in pretty much every scenario.

So discarding libgav1 for x86 for now, let’s see how dav1d 0.5.0 compares to libaom. Single-threaded dav1d is between 1.5 and 2.5 faster.

Multi-threaded that increases to 2.3 to 4.5 times.

Arm

ARMv8 performance is now a main focus to enable AV1 software decoding on battery limited devices. We saw dav1d performance improve steadily over the last year. Single-threaded performance increased by 30 to 50%.

Multi-threaded performance increased even more, between 80 and 90% since it’s first release.

So let’s see how the new kid in town is performing. libgav1 is still a lot slower single-threaded than both libaom and dav1d. Multi-threaded it’s approximately on par with libaom, but nowhere near dav1d.

Looking back at dav1d, on a few real world CPUs, we see that low-end CPUs like the Snapdragon 410 still struggle with 1080p decoding. 720p should be very doable though. High-end devices like the Snapdragon 835 can do 1080p decoding at more than 60 fps when spinning up 4 high performance cores. This has an impact on battery life, but for short sequences it shouldn’t matter that much.

Conclusion

dav1d 0.5.0, which will be released this week, is still the fastest AV1 decoder in town. 1080p AV1 decoding is now possible on almost all x86 CPUs thanks to the added SSSE3 assembly and 4K is doable for almost all quad-core CPUs. On the Arm side 720p is possible on low-end devices and 1080p on the high-end, but energy usage has to be considered.

I hope to be able to test energy usage and battery life soon. At the moment I unfortunately don’t have the equipment for it.

--

--