Smart Buffering — and the two types of player configurations

Published in

Zattoo’s Tech Blog

9 min readJun 24, 2021

The term end-to-end latency already implies that there is not just one end when it comes to latency reduction. In fact, low latency always comes with two sides being involved. There is of course the server-side, which was discussed in the previous blog post. But to leverage the full potential of low latency, while still keeping a stable playback, you need to understand the playback side and also tweak it accordingly.
Our way of doing low latency was by reducing the fragment length. After doing the server-side change, we also looked at the impact on the client-side. This revealed two categories of players; more precisely two categories of player configurations. Depending on the type of player configuration we had to do different adjustments to maintain lower the latency.
But lowering the latency at the cost of the player buffer always comes with a trade-off on stable playback. So we thought about that situation and came up with what we call Smart Buffering to overcome this.

Two types of player configurations

You wouldn’t think that there are fundamental differences in a player configuration for adaptive streaming. Especially if you are only changing something on the server-side, would you expect a huge difference in the playback behaviour between player A and player B if you didn’t touch the player configuration at all?
Well, you will definitely learn that this is happening if you change the fragment length significantly. In our case, some players suddenly operated on buffer levels that were less than half of the size than before the server-side change —our overall strategy for latency reduction likes that, but our aim for stable playback doesn’t like that this happened “uncontrolled”, without us explicitly reducing the buffer size.

Before going deeper, let’s quickly define what we consider as “the buffer” in the following article. With the player buffer, we mean the forward buffer, which is the amount of media a player is supposed to keep in a ready-to-play state while playing a stream. We don’t talk about the stream startup buffer, which is typically lower than the forward buffer and can be defined independently on some players.

Player configuration with number of fragments

The first type of player configuration is specifying the buffer level with a number of fragments. If your fragment length is changing, this will directly impact your buffer levels.

Let’s put down a quick example of specifying player buffers with number of fragments:

+-------------------------------+--------------+
| Forward buffer configuration: |  3 fragments |
| Fragment length:              |  4 seconds   |
| Effective buffer:             |  12 seconds  |
+-------------------------------+--------------+

What happens after changing the fragment size to e.g. 1.6 seconds?

+-------------------------------+--------------+
| Forward buffer configuration: |  3 fragments |
| Fragment length:              |  1.6 seconds |
| Effective buffer:             |  4.8 seconds |
+-------------------------------+--------------+

Wondering where the 1.6 seconds come from? Please follow up on another recent blog post, Definitive guide for picking a fragment length.

This reduces your buffer by more than half! And you didn’t even change something on the client-side.

Player configuration with seconds

The second type of player configuration is, in contrast to the first one, expecting seconds in the configuration. It’s pretty clear what happens, but for the sake of completeness let’s write it down as well.

Example of specifying player buffers with seconds:

+-------------------------------+--------------+
| Forward buffer configuration: |  12 seconds  |
| Fragment length:              |  4 seconds   |
| Effective buffer:             |  12 seconds  |
+-------------------------------+--------------+

Let’s change the fragment length again to 1.6 seconds:

+-------------------------------+--------------+
| Forward buffer configuration: |  12 seconds  |
| Fragment length:              |  1.6 seconds |
| Effective buffer:             |  12 seconds  |
+-------------------------------+--------------+

Effectively, nothing happened.

There we are with two significantly different effects on our latency goal. With the first configuration type, buffer in number of fragments, we are already on low latency on both sides, server-side and client-side. The client-side latency reduction here comes with a somewhat uncontrolled client-side buffer reduction if you are not fully aware of the configuration type of your players. If you want to take it step-by-step, you need to change the player configuration before doing the server-side change. But sometimes that’s not even possible, for example on AVPlayer where the buffering behaviour is part of the protocol specification and black box player implementation.

With the second type, buffer in seconds, we still have high client-side latency. To reduce the client-side latency for the second type we need to update the player configuration for all affected clients.

To know which clients are falling in which category, let’s have a look at what we know from the platforms we support at Zattoo:

+--------------------------------+--------------------------+
| Buffer in number of fragments  | Buffer in seconds        |
+--------------------------------+--------------------------+
| AVPlayer                       | Exoplayer                |
| (iOS, tvOS)                    | (Android, FireTV)        |
|                                |                          |
| STB ABox42                     | Bitmovin Web             |
|                                | (Web Browser: Chrome /   |
| Bitmovin Web (Safari)          | Firefox / Edge, Samsung  |
|                                | Tizen, LG)               |
|                                |                          |
|                                | Microsoft                |
|                                |                          |
|                                | Chromecast               |
|                                |                          |
|                                | Panasonic                |
|                                |                          |
+--------------------------------+--------------------------+

Now we know the different categories of player configurations. We also know what impact a server-side change could have on each platform. And we get the feeling that there needs to be a proper strategy to balance the player difference with a mode that aims for low latency but sets the buffer to a value that still assures stable playback. The open question now is: what is the right buffer value?

Choosing the right buffer value

Three’s a party?

We need to come up with some number for the buffer settings that will make us happy on the latency part, but also on the part of the stable playback. A classic rule of thumb is that a buffer level of three times the fragment length is typically a good choice — there it is: The recommended buffer configuration with number of fragments.

With that, we are actually on a good track with the players that work with this type of configuration. If the configuration specified seconds, we are rather overcommitting on the stability after changing the fragment length from 4 seconds to 1.6 seconds. As mentioned before, to achieve low latency, this leads us to needed changes on the client-side.

But is it that simple?

Our goal might be low latency, but this should not be achieved at the cost of playback stability! By reducing the fragment length from 4 seconds to 1.6 seconds, we would reduce the buffer size from 12 seconds to 4.8 seconds.
Before the change, we could, theoretically, cover network outages of almost 12 seconds, while keeping the playback running. Now it’s only 4.8 seconds. Full network outages are of course the extreme case, but make it clear that the time to react to network conditions is drastically decreased. Even in case you are operating in a managed network environment, there will always be users that experience network drops and bandwidth limitations. To not expose those users to playback issues due to our low buffer settings, we thought about a way of dynamically changing the buffer settings. That’s what we call Smart Buffering.

Smart Buffering

At Zattoo we are operating a Live TV streaming service, where users are typically zapping around to find content. People stick to content while it’s on, but they can also very quickly switch between channels (e.g. during an ad break). The focus of low latency, and the tuned buffer configurations that come with it, is on live content.

To avoid playback issues due to low buffer settings, we want to be able to dynamically change the buffer configuration for a user. Due to the nature of our expected user behaviour on live content, we assume that new streams are being requested rather frequently. This allows us to keep the granularity of changing the buffer config on the level of stream requests, without the need of changing the configuration while playing content. This scope limitation significantly reduces the amount of work needed on the one side, on the other side it avoids running into technical limitations, as some players will not even support on-the-fly configuration changes out of the box.

Data needed to make smart decisions

There is basically one playback metric that is ubiquitous when it comes to playback stability. Of course, it’s the BUFFERING event. This event is, for the user, more annoying than a quality switch or longer startup time. It’s also highly related to the buffer setting of the player, and is the ultimate metric you look at if you want to make playback more stable in case of varying network conditions.

That’s the reason we chose the BUFFERING event to be the main lever for triggering buffer configuration changes. We implemented a set of playback telemetry events that can be accessed by the player itself to make smarter decisions.

When to change the buffer?

Excellent question! The general assumption is that the user has good network conditions — which means, we are starting with a buffer configuration that is appropriate for low latency streaming. In our case, that is three times our current fragment length of 1.6 seconds = 4.8 seconds. We don’t go lower as this will satisfy our primary latency goal already.

But even with the three fragment lengths, we know that we will have more users with playback issues than before, where the buffer was more than twice the amount in seconds. As mentioned, we can change the buffer configuration on each stream start. Also, we know about BUFFERING events that are happening for each user.

With that knowledge we introduced three buckets of buffer configuration that reflect the buffer values over different time spans:

Last playback — the buffer value used for the last stream configuration
Current user session — the average of all buffer values during the current user session
Global user device — the average of all buffer values that a user started on this device

Those three buckets can be put into an algorithm to decide the recommended buffer value for the upcoming stream start. Of course, each bucket can be weighted differently.

In addition to the buffer recommendation based on past knowledge, the decision algorithm takes into account the knowledge about BUFFERING events that happened during the last playback. The simplest way to incorporate this would be:

Increase buffer value, if there was at least one BUFFERING event in the last playback session
Decrease buffer value, if there was no BUFFERING event in the last playback session

This will make playback more stable if it turns out that the initial low latency setting is too aggressive for the user’s playback conditions.
If the buffer value was already increased before, this way will also try to reduce it again once we reached a stable playback — intending to get the least latency while keeping playback stable. Due to the weighted algorithm of past buffer values, this buffer selection logic will stabilise itself over time to a value that works well for the user.

Summary

We identified two different types of player configurations and discussed how they react in case the fragment length changes. Our overall goal was to reduce the end-to-end latency for live TV streaming on our service. With this article, we showed what’s needed on the client-side to achieve that, and proposed a simple Smart Buffering logic that helps to reduce the latency while keeping playback stable.

This is the third part of a blog article trilogy that covers all aspects of our journey to low latency streaming.
The first part is about choosing a fragment size as the fragment size plays a crucial role in our low latency attempts: The definitive guide for picking a fragment length.
The second part describes our general concept for reducing the latency of live streams and goes into detail on the server-side changes we did, as well as how end-to-end latency can be measured in an OTT streaming service: How to go low latency, without special tricks.