The Road to Everywhere — by improving existing transcoding workflows
Minimizing the amount of data sent not only saves CDN storage but also data costs for the viewer and improves accessibility where available bandwidth is lower. In this blog post our video transcoding expert Johan Skaneby is taking a closer look at defining video streams primarily by quality and not only by bitrate. This post also includes an interview with Carl Lindqvist from Bonnier Broadcasting in Sweden who has studied quality based encoding for some years now.

Compressing and customizing video for different types of production- and distribution workflows is a craft that has been a prerequisite in the latest digital TV decade. The Holy Grail is the knowledge to minimize the amount of data sent without the quality of the material being affected too much. To do so saves not only CDN storage, but also data costs for the customer and improves accessibility where bandwidth is lower. A general better experience for the customer becomes a unique selling point for the VOD provider in the end, right?
In a time of new standards, templates and general best practices, it may seem that we know how we achieve all this. But even if we see new, more efficient codecs being developed and released, it is still important that we continue to discuss how video compression actually works to better understand how we can develop, improve and streamline existing environments today.
Now, all of you working in this industry, particularly with the video itself, are of course familiar with how quality, size of frame, fps relates to bitrate. In the world of OTT we still have different packager standards such as HLS, HDS, DASH, MSS — but they can all share the same type and kind of video and audio. The common specification so far is H264 and there are a standard set of rules how to prepare i.e compress and transcode the media to allow support in all above mentioned packagers.
One of the most important functionalities in the way the OTT distribution works is the the way we can adapt the streams depending on bandwidth. To do this we first need to transcode a number of versions from of our source material — lets say 6 of them.
The first ones are to be used under lower bandwidth conditions — where as the others have larger frame sizes and are to be used in better conditions.
The client will communicate with the server according a number or rule sets and if the bandwidth is getting lower then the client will ask the server to provide a lower bandwidth version of the same material. A switch will eventually occur and the image quality lowers as the new stream is available. In order to make this works flawlessly the IDR frames in the stream needs to be aligned. But that is another topic and blog.
Example on bitrates:
256 x 144 300 kbps
426 x 240 500 kbps
640 x 360 800 kbps
1024 x 576 2000 kbps
1280 x 720 4000 kbps
1920 x 1080 6000 kbps
Above is the traditional approach to encoding of video. We have a lower bitrate and smaller frame size for you if your bandwidth drops. On the other hand if you have a great bandwith we have a full size frame to present to you.
The problem with this very common way to approach video compression is that it assumes a very similar input material to be present all the way. But that isn’t the case in most programs.
Let me take an example.
An intense ice hockey game is broadcasted using above standards. The client has an available bandwidth that allows for the 1280x720@4000 kbps. We see a lot of action and moving camera pans which of course eats compression bits, all of them.

At half time we cut to the studio. This is the opposite kind of situation. The commentators are having a discussion about the game. The static image of the studio camera describes the discussion. All fine — but a closer look will reveal the encoder is still using almost 4000 kbps here to describe this scene.

In this example, which is very much the common way to do this, we are actually using the same amount of bits to describe the actions scenes as we are describing the studio discussion. Something is wrong here — and we are most likely paying CDN costs and data transfer rates everywhere to continue this bitrate based approach to quality.
Quality Based Encoding
Defining compression using a quality ladder is nothing new — but in most cases you are given the option to choose between quality OR bitrate. Why can’t we just use both?
I am going to use the well known, open source codec x264 to exemplify my idea. In the set of options found in x264 you will find find a quality parameter called CRF ( Constant Rate Factor ). This is a commonly used quantizer mode that ranges from a quality of 0–51. The level 0 is “lossless” — where as 51 is compressed into pieces most likely not viewable.
It is known fact that the levels 18–24 is a very good balance between bitrate and visual quality. But remember CRF in itself does not make any promises to bitrate. You will simply tell the encoder I want this quality of 18 — use whatever bits necessary to give me this quality.
In the ice hockey studio example bitrate found at CRF 18 might have dropped to 2000 kbps, but video from the actual game could skyrocket to 5000 kbps or 6000 kbps where needed. We don’t know cause we only asked for CRF 18 here.
So quality based encoding itself here is not the very best way to create our 6 streams because we don’t have any control at all.
But there are actually a way to control this behaviour in x264 by combining CRF with a bandwidth cap or roof att 4000 kbps. Take a look:
c:v libx264 -crf 18 -maxrate 4000k -bufsize 8000k
We are now asking the encoder to target the quality 18. If this is possible already at for example 2000 kbps — then all is fine, but if we need more than the 4000 kbps to target this quality the encoder will adjust the CRF to a higher value in order to not require more than 4000 kbps.
To exemplify this I have provided below examples of exactly the same media file encoded with the two described techniques. The media starts with an interview at the first part and cuts to action scenes on the last part.


Above is a very simple way of implementing a much more efficient encoding. The drop from 3333 kbps to 2582 kbps also gives you a hint what amount of storage can be saved at the same quality of video perception.
There will be a need to address the bitrate specified in the packager creating the manifests as well. We do not want the packagers to calculate an average bitrate here,but rather keeping the value 4000 kbit as the average bitrate in the manifest. This can be achieved with some, but not all packetizers on the market.
Many of you have of course also read the the Netflix documents on the Encode per Title technique. This is a related approach but includes much more advantages — but I believe that simply just using above technique can take us a long way at a very low integration investment.
Also give this technique a thought when you are deciding how to archive. Some of your content needs more bits whereas other needs less — make your decisions based on quality not bitrate only.
Carl Lindqvist, Solution Architect at Bonnier Broadcasting in Stockholm, has studied this quality based encode technique for some years now.

What made you interested in this quality based encode technique from the beginning?
- Well, the general rule has always been to use VBR to make use of all the necessary bits needed to store video and to use CBR for video streaming. However I could not really stop thinking about all those bits that was wasted in a video stream just because the players inability to understand different frame sizes. So I started to investigate the options available to combine quality with a max rate. And the technique most transcoders offered was only 2 -pass encode. This relates more to total amount bytes available to distribute the video at a fixed quality. Not the criterias I was looking for here. I was more looking for a quality definition in combination with a bitrate max rate in order to more efficiently control the way the streams gets encoded based on input complexity.
Why is it important? where do you see the benefits?
- Using this quality based encoder strategy not only saves a lot of bandwidth for the customer, (which is important on mobile networks ) but it also saves us a lot of money since we pay for the gigabytes we stream as well as the associated CDN costs. I mean, why distribute a slow moving animated film at 8 mbit as an example, when most likely 2 mbit is only needed for that part of the scene? With this technique mobile users also has a better chance to access a higher quality video stream at the currently available bitrate.
Can you share any numbers on the bits and bytes saved here?
- It is so dependent on the input material. If you have a full action moving movie we might enter the top allowed bitrate more often. But a general feature film I can see savings up to 30–40 % compared to our previous profiles, which of course is quite a lot.
So if you want to try this technique “at home” — how do you balance the right quality value against the target bitrate?
- This needs to be tested of course. Setting a to low value ( = high quality ) for CRF will create to high quality ambitions on the encoder leading to video bitrate to always target the max bitrate level. Setting a to high CRF value ( = low quality ) will make the encoder accepting a to low ambition on quality in general.
Is this the perfect approach or are we missing anything?
- This quality based encode strategy requires the packager to be adjusted accordingly. We do not want the packager to automatically calculate an overall average bandwidth based on the actual bitrate findings in the stream. Instead I will specify the average and max bandwidth to the same value in the manifest file to make sure that the stream is delivered based on the max rate bitstream we have specified as a roof in our profiles.
What is you experience so far in real life tests compared to standard based bitrate encodes?
- In our field tests we have found this technique to work according to our expectations and we are very satisfied with the results.
There are of course more details to this quality based encode technique to be discussed. But the goal and my point with this blog post is to make you aware of these possibilities and hopefully start some internal discussions on improvements on current workflows and costs. I also want to mention that more and more commercial transcoding solutions are implementing this technique. The CRF found in the x264 is most likely available already on all commercial transcoding solutions that have a licensed x264 component with the advanced options. Telestreams Vantage and Cambrias Capella are two examples. Other solutions with their own quality algorithms out there are to my knowledge Harmonics EyeQ and Beamr.
There are more products coming on the market offering this technology, but none that is officially announced yet at the time of writing. Still, do not hesitate to discuss this topic with the product team of you own transcode provider. There might be interesting news around the corner.
Johan Skaneby
Video Specialist, Eyevinn Technology
