AV1 is ready for prime time: SVT-AV1 beats x265 and libvpx in quality, bitrate and speed
AV1 encoding is slow. At least, that was the status quo. But with both rav1e (backed by Xiph, Mozilla and Vimeo) and SVT-AV1 (backed by Netflix and Intel) in heavy development, this notion is changing fast.
Today, I’m going to demonstrate that SVT-AV1 can be both faster while delivering higher quality at identical bitrate simultaneously. Together with fast AV1 decoding with dav1d this makes the AV1 codec ready for broad adoption.
Part 1: Quality
The whole promise of AV1, aside from being free and open-source, was delivering higher quality at the same bitrate. The rav1e developers have build a great tool for comparing video quality, called AreWeCompressedYet?. I submitted runs for the libvpx, x265 and SVT-AV1 encoders, with the first two being the status quo for high quality encoding.
On the X-axis the bitrate is displayed in bits per pixel. So 0.02 means on average 0.02 bits are spend per pixel and 0.1 means 0.1 bits per pixel. For example, for 1080p 30fps that would result in 1,25 Mb/s and 6,25 Mb/s respectively.
For x265 and libvpx their highest quality modes where used, veryslow and cpu-used 0 respectively (x265 placebo resulted in worse quality on this test set). SVT-AV1 uses Enc-mode 4 and 6 in the graphs below. SVT-AV1 has both faster and slower modes, ranging all the way from mode 8 to 0.
First are PSNR and MS SSIM, two objective metrics that both calculate the mathematical error between the input and output video stream. These values are displayed on the Y-axis, higher value means higher quality.
Both SVT-AV1 modes 6 (green) and mode 4 (yellow) provide better objective quality than x265 (red) and libvpx (blue), on both the PSNR and MS-SSIM metrics.
Then we have a subjective metric, which should better represent how the user experiences video quality. VMAF was developed by Netflix to better assess quality perceived by it’s users.
SVT-AV1’s subjective quality is a little worse than it’s objective quality. Enc-mode 4 still trumps libvpx and x265, but Enc-mode 6 is a little worse.
Part 2: Speed
To compare speed I spun up some Google Cloud instances to get a fair comparison. Both instances use 16 vCPUs (8 cores, 16 threads) on the Cascade Lake platform with 64 GB DDR4 (SVT-AV1 ran fine on 16 GB btw). This setup should be comparable to a high-end desktop PC with Ryzen 7 3700X or Core i9-9900K.
All encoders were compiled with GCC 8.3.0 with Release configuration. I benchmarked two files for two scenario’s: 1250 frames of a 1080p 8-bit 4:2:0 clip representing regular 1080p content, and 250 frames of a 2160p 10-bit 4:2:0 representing high-end HDR movie content. Each encoder was run twice and the fastest run was used. Below the results in frames per second:
As we can see, SVT-AV1 enc-modes 5, 6 and 7 are clearly faster than both libvpx and x265. Also, libaom is even at a fast preset (cpu-used=5) very slow.
When normalized, the differences become even clearer.
On 8-bit concent (Sintel), enc-mode 4 is 32% faster than libvpx and 4% slower than x265. enc-mode 7 . On 10-bit (Foodmarket) results are even larger.
Part 3: Diving in deep
So, we now know globally that SVT-AV1 can be simultaneously faster and delivering higher quality at the same bitrate than both libvpx (VP9) and x265 (H.265). In this section I compare different SVT-AV1 encoder modes a little more in depth. We will mainly be looking at MS-SSIM for objective quality and VMAF for subjective quality.
In mode 7 SVT-AV1 still needs 3,6% to 9,2% more bits to reach similar MS SSIM (objective) quality as libvpx. For similar VMAF (subjective) quality this is even higher at 9,5% to 23,4%, depending on resolution.
Compared to x265 the results are a little better, it uses 1,6% more to 10,9% less bits to reach similar MS-SSIM quality, and 10,4% more to 1,4% less for VMAF.
Meanwhile, it’s 6,42 times faster than libvpx and 4,68 faster than x265.
Mode 6 reduces the bits needed for similar quality by about 4% to 8% compared to Mode 7. Compared to x265 MS-SSIM bitrate is now significantly lower, while compared to libvpx bitrate needs to be a bit higher still.
Mode 6 is 4,60 times faster than libvpx and 3,35 times faster than x265.
Mode 5 provides the firsts wins over libvpx on the MS-SSIM metric. It’s also almost exclusively better than x265 on all metrics.
Mode 5 is 3,01 times faster than libvpx and 2,20 times faster than x265.
Mode 4 just beats the shit out of x265 with huge double-digit bitrate reductions. On average, for equal PSNR quality it needs 20,1% less bits, for equal MS-SSIM quality 19,2% less bits and for equal VMAF quality 9,2% less bits. Meanwhile it’s only 4% slower.
The gains over libvpx are a little smaller, with 8,8%, 9,0% and 3,1% bitrate reductions on average for equal PSNR, MS-SSIM and VMAF quality respectively. It accomplishes this quality at 32% higher speed however.
While SVT-AV1 is still in heavy development, it already reached a point that it delivers better quality at identical bit-rates at higher speeds on multi-core machines. Overnight encoding of hour-long video’s, comes within reach on most high-end machines using enc-mode 4, and even on mid-range machines using enc-mode 6 or higher.
Build instructions can be found in the SVT-AV1 repository. Plugins (patches) for FFmpeg and Gstreamer are also available. Pre-compiled 64-bit Windows executables can be found on AppVeyor. The media-autobuild_suite also supports SVT-AV1 for FFmpeg builds with the SVT-AV1 patch.
Sources & data
- These AWCY runs: https://beta.arewecompressedyet.com/?job=x265-veryslow-limited%402019-08-23-c&job=vp9-cpu0%402019-09-03&job=SVT-AV1-enc-mode-7%402019-10-06&job=SVT-AV1-enc-mode-6%402019-10-06&job=SVT-AV1-enc-mode-5%402019-10-06&job=SVT-AV1-enc-mode-4%402019-10-06
- These performance results: https://docs.google.com/spreadsheets/d/1p3PJQMkyhIrXEL6MBBQiwTnWmGyHY_zuw7_XAD0def4/edit#gid=959310253
- These compile and benchmark commands: https://gist.github.com/EwoutH/b908f3527d630326266de0d6e2a953fd
- A Google Compute Engine n2-standard-16 (16 vCPU’s, 64 GB memory) Cascade Lake instance running Ubuntu 18.04 LTS