Transcoding with FFMPEG

Amine kherchouche
CodeX
Published in
3 min readMar 24, 2022

Introduction

HTTP Adaptive Streaming (HAS) is widely used on the service provider side to ensure and maximize the QoE. Let’s suppose a typical streaming scenario, where a given user wants to play a video (with a native resolution S). We have two constraints to respect, the resolution S’ that the client can support and the bandwidth capacity (R < Rmax).

The objective is to send the video with the optimal resolution/bitrate combination that meets the client’s display resolution and bandwidth. This technique is illustrated in the diagram below by encoding the source material with several parameters (CBR, CRF, or CQP) shorthand for Constant BitRate, Constant Rate Factor, and Constant Quantization Parameter, respectively. The encoding results are stored on a server, and the client requests the version that best meets its requirements.

Adaptive Bitrate streaming system

Transcoding part

Let’s take a closer look at the point where we encode our source with various parameters. The purpose is to create a database of bitrate/resolution pairs that the client may utilize to stay within its limits. Consider a single scene in an FHD video source at 50 frames per second (i.e. 1920x1080). To make alternative versions, we need to change various settings, such as downscaling the sequence from 1080p to 720p or 540p, and encoding using QPs in the range of [21–51] (to simplify).

Following that, I’ll utilize FFMPEG (an open-source framework) python commands to demonstrate the various parts of the transcoding process.
To downscale the sequence, use the FFMPEG command below:

ff = FFmpeg(inputs={‘VideoSource1080p50.yuv’: " -f rawvideo -pix_fmt 'yuv420p' -s:v '1920x1080' -r 50"},outputs = {‘VideoSource720p50.yuv’: "-vf scale=1280x720:flags=lanczos -pix_fmt ‘yuv420p’"})ff.run()

We now encode the downscaled versions (720p, 540p) using our QP range after they’ve been generated. (Note that this phase includes the native resolution.)

ff = FFmpeg(inputs={‘VideoSource720p50.yuv’:”” },outputs={‘VideoSource720p50_30.yuv’: “ -c:v libx264 -x264-params qp=30”})ff.run()

The program above takes the 720p downscaled version and encodes it with the X264 codec with the QP parameter set to 30. In the QP range, we perform this command (the same process for the other resolutions). We can obtain the bitrate value at this stage. After that, we can return to its original resolution and compute the PSNR metric with the reference using the same command as before, but with the right inputs/outputs and target resolution:

ff = FFmpeg(inputs={‘VideoSource540p50_40.yuv’: “ -f rawvideo -pix_fmt ‘yuv420p’ -s:v ‘960x540’ -r 50”},outputs = {‘VideoSource540p50_40_to_1080p.yuv’: “-vf scale=1920x1080:flags=lanczos -pix_fmt ‘yuv420p’”})ff.run()

To calculate the PSNR, we introduce the following command:

ffmpeg -y -i “VideoSource540p50_40_to_1080p.yuv” -i “VideoSource1080p50.yuv” -lavfi psnr=”stats_file=PSNR_File” -f null -

Convex hull

The convex hull of the video source

We can create the RQ curves for each resolution using rate/quality pairs after computing the bitrate and PSNR values of all the encoded versions with our three resolutions (1080p, 720p, and 540p). The convex hull is the curve that defines all of our curves’ bounds.

Bitrate/resolution pairs

We can build the table above using the convex hull, which associates each rate value with the relevant resolution. This table is crucial to solving the problem of the two limitations mentioned before. So, in order to request the proper version, the client selects the pair that best utilizes its bandwidth while maintaining the highest possible quality.

Conclusion

We showed how to display the best version of the requested source file on the client screen by picking the appropriate pair that suits the client’s limits in this short lecture. To recap, we use FFMPEG instructions to downscale the original file into the desired resolutions, encode each file with a range of QPs, and then upscale all the encoded versions to the native resolution to compute the PSNR with the source file (known as the reference). Finally, we produce the pairings by drawing the RQ curve for each resolution and extracting the convex hull.

References

[1] https://ffmpeg.org/
[2] https://netflixtechblog.com/per-title-encode-optimization-7e99442b62a2

--

--

Amine kherchouche
CodeX
Writer for

The only true wisdom is in knowing you know nothing.