Unlock Video Quality Metrics DIY: Calculate VMAF and PSNR Without Outsourcing!

Sandipan Mondal
Media Cloud Tech
Published in
7 min readMar 5, 2024

After completing the video encoding process, it’s natural to wonder about the user experience on your platform. Fortunately, there is a way to gauge whether the viewing experience will be positive or negative. Netflix pioneered the Video Multimethod Assessment Fusion (VMAF) technique in 2016, providing a reliable method to assess and ensure the quality of video playback.

What is VMAF?

VMAF, or Video Multimethod Assessment Fusion, is a metric used to evaluate the quality of compressed video streams. It was developed by Netflix to address the limitations of existing video quality metrics, such as PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index). VMAF aims to provide a more accurate and perceptually meaningful assessment of video quality.

VMAF represents a significant advancement in video quality assessment, offering nuanced and accurate evaluation compared to traditional metrics. Its integration of machine learning models and adaptability to various content types make it a valuable tool for content creators, streaming services, and researchers in the field of video compression and delivery.

Why VMAF is important?

  1. Precise Perceptual Quality Measurement:

VMAF is designed to closely emulate how humans perceive video quality. It incorporates multiple visual features, including contrast, texture, and motion, resulting in a metric that aligns more accurately with subjective human judgment. This precision is crucial in an era where viewer expectations for high-quality content are paramount.

2. Dynamic Encoding Parameter Selection:

In the realm of video streaming, bandwidth constraints and the diversity of user devices necessitate adaptive encoding strategies. VMAF enables streaming services to dynamically adjust encoding parameters based on content characteristics, viewer preferences, and real-time network conditions. This adaptability ensures an optimal balance between video quality and efficient bandwidth utilization.

3. User Engagement and Retention:

High-quality video content is directly linked to user engagement and retention for streaming platforms. VMAF aids in delivering videos that not only meet technical standards but also captivate audiences with superior perceptual quality. As users increasingly demand immersive and visually appealing experiences, VMAF plays a pivotal role in meeting these expectations.

4. Economic Video Delivery:

Bandwidth and storage costs are significant considerations for content providers. VMAF assists in making encoding decisions that are not solely focused on minimizing file sizes but optimizing for perceptual quality. This approach ensures a cost-effective video delivery process, as platforms can tailor their encoding settings to maintain an acceptable quality level while managing resource consumption efficiently.

5. Adaptive Streaming Precision:

VMAF is a linchpin in the precision of adaptive streaming technologies. By evaluating video quality in real-time, adaptive streaming algorithms can make informed decisions about bitrate adjustments. This dynamic adaptation is crucial for providing a seamless viewing experience, especially in situations where network conditions fluctuate.

6. Objective Benchmarking and Research:

Researchers and developers in the video compression and streaming domain rely on VMAF as a standardized metric for benchmarking. Its adoption as an industry standard facilitates objective comparisons between different encoding techniques and algorithms. This, in turn, drives innovation by providing a common ground for assessing advancements in video quality.

7. Industry Collaboration and Standardization:

The widespread adoption of VMAF by major streaming platforms contributes to industry collaboration and standardization. This shared metric fosters a common language for video quality assessment, enabling content providers and technology developers to communicate and collaborate effectively. This unity promotes advancements in video delivery techniques and technologies.

In essence, VMAF’s relevance is deeply rooted in its ability to bridge the gap between technical metrics and human perception, fostering an environment where video streaming services can deliver high-quality content that meets user expectations, optimizes resource usage, and drives industry innovation.

I hope you can grasp the concept of VMAF and its significance. Now, let’s delve into incorporating it into your workflow and understanding the process.

If the provided video is accessible in various resolutions on your platform, it is essential to generate corresponding source files for each resolution. VMAF calculations must be performed with files of identical resolutions. As an illustration, let’s assume the input file is named “input.mp4,” and it has been encoded in five different resolutions: 360p, 576p, 720p, 900p, and 1080p. The encoded file names are as follows: “input_360p.m3u8,” “input_576p.m3u8,” “input_720p.m3u8,” “input_900p.m3u8,” and “input_1080p.m3u8.” Consequently, five distinct input files need to be created: “360p.mp4,” “576p.mp4,” “720p.mp4,” “900p.mp4,” and “1080p.mp4.”

Step 1:

Ffmpeg command to scale a video:

ffmpeg -i input_video.mp4 -vf "scale=480:270" 270p.mp4 \ 

-vf "scale=640:360" 360p.mp4 \

-vf "scale=1024:576" 576p.mp4 \

-vf "scale=1280:720" 720p.mp4 \

-vf "scale=1600:900" 900p.mp4 \

-vf "scale=1920:1080" 1080p.mp4

Step 2:

Now it’s time to calculate the VMAF score:

# calculating the VMAF 

ffmpeg -allowed_extensions ALL -i 270p.m3u8 -i 270p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_270p.txt ; \
ffmpeg -allowed_extensions ALL -i 360p.m3u8 -i 360p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_360p.txt ; \
ffmpeg -allowed_extensions ALL -i 576p.m3u8 -i 576p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_576p.txt ; \
ffmpeg -allowed_extensions ALL -i 720p.m3u8 -i 720p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_720p.txt ; \
ffmpeg -allowed_extensions ALL -i 900p.m3u8 -i 900p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_900p.txt ; \
ffmpeg -allowed_extensions ALL -i 1080p.m3u8 -i 1080p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_1080p.txt ; \

Explanation:

For each resolution (270p, 360p, 576p, 720p, 900p, 1080p):

  • ffmpeg: Invokes the FFmpeg command-line tool.
  • -allowed_extensions ALL: Specifies that all file extensions are allowed when processing inputs.
  • -i <playlist_file>: Specifies the input .m3u8 playlist file.
  • -i <video_file>: Specifies the input .mp4 video file.
  • -filter_complex libvmaf: Applies the libvmaf filter complex, calculating the VMAF score.
  • -f null: Specifies the null output format (no actual output file is generated).
  • 2>&1: Redirects standard error (stderr) to standard output (stdout).
  • | tee output_<resolution>.txt: Pipes the combined output to the tee command, which writes the output to both the console and a text file named output_<resolution>.txt.
  • ; \: Separates each command, allowing them to be executed sequentially.

The script assesses video quality for each resolution separately, and the results are stored in corresponding text files (output_270p.txt, output_360p.txt, etc.).

If you want to combine them together, just use a shell script.

#!/bin/bash

# generating mp4 files

ffmpeg -i input_video.mp4 -vf "scale=480:270" 270p.mp4 \

-vf "scale=640:360" 360p.mp4 \

-vf "scale=1024:576" 576p.mp4 \

-vf "scale=1280:720" 720p.mp4 \

-vf "scale=1600:900" 900p.mp4 \

-vf "scale=1920:1080" 1080p.mp4

# calculating the VMAF

ffmpeg -allowed_extensions ALL -i 270.m3u8 -i 270p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_270p.txt ; \

ffmpeg -allowed_extensions ALL -i 360.m3u8 -i 360p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_360p.txt ; \

ffmpeg -allowed_extensions ALL -i 576.m3u8 -i 576p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_576p.txt ; \

ffmpeg -allowed_extensions ALL -i 720.m3u8 -i 720p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_720p.txt ; \

ffmpeg -allowed_extensions ALL -i 900.m3u8 -i 900p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_900p.txt ; \

ffmpeg -allowed_extensions ALL -i 1080.m3u8 -i 1080p.mp4 -filter_complex libvmaf -f null - 2>&1 | tee output_1080p.txt

Now, let’s find out the highest VMAF score for your video.

90 and above: Excellent quality

80 to 90: Very good quality

70 to 80: Good quality

60 to 70: Fair quality

Below 60: Poor quality

Now we will talk about another metric which is equally popular in terms of video assessment.

PSNR:

The term peak signal-to-noise ratio (PSNR) is an expression for the ratio between the maximum possible value (power) of a signal and the power of distorting noise that affects the quality of its representation. Because many signals have a very wide dynamic range, (ratio between the largest and smallest possible values of a changeable quantity) the PSNR is usually expressed in terms of the logarithmic decibel scale.

PSNR is most easily defined via the mean squared error (MSE). Given a noise-free m×n monochrome image I and its noisy approximation K, MSE is defined as

The PSNR (in dB) is defined as

Here, MAXI is the maximum possible pixel value of the image. When the pixels are represented using 8 bits per sample, this is 255. More generally, when samples are represented using linear PCM with B bits per sample, MAXI is 2B − 1.

Script to calculate the PSNR value for multi bitrate ladder:

#!/bin/bash 



ffmpeg -allowed_extensions ALL -i 270.m3u8 -i 270p.mp4 -lavfi "ssim;[0:v][1:v]psnr" -f null - 2>&1 | tee 270Poutput.txt ; \

ffmpeg -allowed_extensions ALL -i 360.m3u8 -i 360p.mp4 -lavfi "ssim;[0:v][1:v]psnr" -f null - 2>&1 | tee 360Poutput.txt ; \

ffmpeg -allowed_extensions ALL -i 576.m3u8 -i 576p.mp4 -lavfi "ssim;[0:v][1:v]psnr" -f null - 2>&1 | tee 576Poutput.txt ; \

ffmpeg -allowed_extensions ALL -i 720.m3u8 -i 720p.mp4 -lavfi "ssim;[0:v][1:v]psnr" -f null - 2>&1 | tee 720Poutput.txt ; \

ffmpeg -allowed_extensions ALL -i 900.m3u8 -i 900p.mp4 -lavfi "ssim;[0:v][1:v]psnr" -f null - 2>&1 | tee 900Poutput.txt ; \

ffmpeg -allowed_extensions ALL -i 1080.m3u8 -i 1080p.mp4 -lavfi "ssim;[0:v][1:v]psnr" -f null - 2>&1 | tee 1080p_output.txt ; \

Similar to VMAF, you also need to generate separate MP4 files for each encoded version.

Now, let’s find out the highest PSNR score for your video.

Above 40 dB: Excellent quality

30 to 40 dB: Very good quality

20 to 30 dB: Good quality

Below 20 dB: Poor quality

--

--