The third and final part of the ABR streaming tutorial will focus on consolidating your newly acquired knowledge from ABR part 1 and 2. ABR part 3 includes some reasoning behind the ABR concepts discussing the ABR advantages and disadvantages raised in ABR part 2. Having grasped the concepts of ABR is good. But it is only when you can discuss the technology from different viewpoints that you truly understand the subject.
This publication is part of a series of articles describing the principles of the technology behind video streaming. Before reading the 3rd part of ABR streaming principles it is advised to have read through both ABR part 1 and 2.
While the ABR technology has generally been described from a positive viewpoint throughout this article series, the end of ABR part 2 raised some concerns:
- Storing multiple bitrates
- High over-head traffic in HTTP
- Multiple formats
- High latency (complex subject deserving its own chapter)
These shortcomings will be discussed in the following sections. Understanding the choices made by the industry is important for an adequate knowledge of the subject.
Concern — Storing Multiple Bitrates
Storing multiple bitrates was mentioned as one of the drawbacks of ABR streaming, requiring multiple profiles of the same video, but we can look at it from the positive side. The multiple bitrates actually address various quality levels for any device on the market, would it be a 4” iPhone screen over a 3G network, or a 60” Samsung TV connected over a 1Gbps fibre link. Other streaming methods in the past were targeting only a fraction of todays plethora of various devices. IPTV targeted TVs over managed networks, cable targeted TVs connected to the old cable networks etc. So from that perspective, ABR has consolidated video consumption with optimal quality on any device and network connection on open internet, using the same streaming infrastructure.
Concern — High Over-Head
In the past two articles ABR was described using HTTP/TCP as the transport protocol. Each segment, 2–10s long, needs to be fetched using HTTP/TCP. Could we use a more lightweight protocol to lower the over-head traffic?
During the 90-s when internet was first considered for commercial TV and video delivery, HTTP/TCP was regarded as too heavy. Instead other protocols like RTP/UDP were developed and came to serve as the foundation for IPTV — the first real commercial infrastructure for TV and video delivery over the IP network. RTP/UDP was fast and lightweight but also unreliable which required it running over managed IP networks. More about that in the IPTV article.
A few years after the new millennium HTTP/TCP was again considered for TV and video delivery, mainly because it would open up for service delivery to any device connected to the internet — being a best-effort network. The network capacity rapidly grew and became high enough for the service too.
HTTP/TCP was chosen because a few important reasons:
- HTTP/TCP has built in transfer control ensuring that all data is received at the client side. Retransmission, congestion control, and data integrity is built into the protocol. Perfect for delivering top quality over a best effort IP network.
- HTTP/TCP is the most widely used 7th/6th layer protocol on the internet today. Running over port 80 it’s allowed through most firewalls and proxies. No client needs HTTP/TCP integration efforts, it is already supported natively.
Today ABR is usually streamed over HTTPs which adds even more over-head. But there are also other technologies emerging mitigating the problem, such as SRT/UDP (Secure Reliable Transport) adding functionality for retransmission among other things, and QUIC — a transport network layer protocol designed to reduce connection and transport latency. Both are however outside the scope for this article.
Finally, using larger segment sizes in the 10s range would lower the over-head traffic but it would also cause higher latency. More about that in the latency sections below.
Concern — Multiple Formats
As described in ABR part 2, the main streaming formats used today are heavily related to consumer electronics devices; Apple HLS on Apple products such as iPads, Safari browsers etc, MPEG-DASH mainly on Android devices, Microsoft MSS yet for some time on Microsoft devices and IE Explorer. To make things worse, DRM, codecs and player devices adds to the complexity, but this will be covered in the Video Formats article.
New technologies and standards has mitigated this problem drastically, basically by switching places between storage and re-packager, yielding On The Fly repackaging. Content is stored in a common file format on disk and repackaged (and encrypted) on the fly for each received client request. Depending on how many formats that are used on the fly repackaging saves at least 50% storage.
More about formats and on the fly repackaging in the Video Formats article.
Latency was described briefly in the previous article ABR part 2. Let us break down how it actually works and what the industry players are doing for bringing down the latency.
In below reasoning we are focusing on live streams. Live has a live point which is where the segments are produced. A client cannot download any segments past the live point as they haven’t yet been produced. VOD content do not have any live point so latency is non-existent. All VOD content segments are already produced beforehand, so the client can download them at any bitrate.
Note: In fig 3 the camera should represent an ABR encoder where the segments are produced. The camera icon is there for pedagogical purposes. The additional latency between the camera and the encoder or within the CDN is disregarded in this ABR tutorial.
The common player, including the iOS native player, works by always storing three complete segments in the buffer. The latency depicted in fig 3, represented by the white segments, totals three and a half segments. Using standard 2–10s video segment lengths, the latency ends up at 7–35s between encoding and client playout. Maintaining three complete segments in the buffer the player has a good amount of time to receive new video segments, decrypt the content, and decode the video.
Lowering Segment Lengths — Saving Up To 24s Latency
One natural way of lowering the latency is of course to lower the segment lengths. Using a 2s segment length means that we will keep the latency at the lower end of the 6–30s delay. This is usually the first thing to very easily lowering latency, but with the penalty of getting more over-head traffic and limiting the GOP-lengths (Encoding and Resolutions article).
But why not using even smaller segments than 2s, say 200ms? More about that, and especially why a segment can’t be smaller than a GOP length, in the Chunked Transfer Encoding chapter below.
Lowering the Client Buffers — Saving 2 Segments Length at The Risk Of Quality Loss
One way of lowering the live latency is of course to implement clients to accept fewer segments in the buffer. How about 1 segment buffer?
Think of the segments as bricks. When building a wall, laying one brick while the next is being produced could result in waiting time. In the same way, playing one segment while downloading the next certainly brings down latency but is risky for guaranteeing a smooth playout. We discover that the risk depends on the network quality. Transfer speed fluctuations may easily lead to buffering time caused by buffer starvation. But with a reliable network we can allow for one or two segments buffer length.
Client Start Policy — Saving Up To 1 Segment Length Latency
Another way of lowering latency slightly is to implement a thought-through client start policy. If the client deliberately waits to start playing right until the sixth segment has been completed it can start playing right at the fourth segment instead of the third, securing a maximum three segments latency. Fig 4 illustrates this with the blue area representing waiting time from the session start point (user presses play button), to where the stream actually starts playing.
Chunked Transfer Encoding
Chunked transfer encoding is a feature in the HTTP 1.1 protocol that allows files to be transferred piece by piece and thereby before they are fully completed. Applying this in our ABR video case, instead of waiting with the file transfer until the complete segment has been produced, the segment file transfer may start as soon as we have the first set of bytes ready.
In Chunked Transfer Encoding the segment slices/pieces are called chunks. Fig 5 reuses our perspective from above to illustrate the latency.
Note that 1/3 segment chunks are used for visual purposes. In reality the chunks are much smaller and may be from 50ms long. This would correspond to 1/40 of a 2s segment.
It is important to understand that small chunks like this require that the client are a lot more synchronized with the encoder to know when the chunks are produced and available for download. There are a few technologies securing this synchronization, but this is outside of the scope here.
“With great power comes great responsibility”
-Voltaire, Spiderman, and other intellectuals
Bringing down the latency below a segment size using Chunked Transfer Encoding naturally puts high stability requirements on the network. Any speed fluctuations or performance degradation will immediately result in buffering or glitching video viewing experience. Setting the correct chunk size will therefore require sufficient testing and calculations.
Why make it so complex? Why not just having the formats supporting smaller segments than 2s?
To understand the primary reasons for not using smaller segments we need to understand the basics of video compression. This is further explained in the Encoding and Resolutions article, but I will still use the terminology here to explain. If the following is unclear, it is recommended to read the Encoding and Resolutions article and then revisit this article to fully understand ABR segment lengths.
Going further below 2s segments is unsupported by the formats themselves, but let’s briefly consider the idea of using even smaller segments; say 200ms segments. Remember from the ABR part 1 article how segments are built up always starting with a “complete picture”? The “complete pictures”, known as I-frames, which is the term we’re going to use below.
Without having segments starting with I-frames the entire ABR principle will be lost. So smaller segments imply more frequent I-frames which in turn limits our GOP lengths. A 200ms segment size would simply limit the GOP length to maximum 200ms. A small GOP size/more frequent I-frames means lower compression rate, and we would end up with a low quality/bitrate ratio. So when choosing segment lengths, even above 2s, the content compression and GOP lengths must be considered to achieve a good video quality of experience.
Thank you for reading the ABR tutorial articles! If you find this information valuable, be sure to read the other articles about Encoding and Resolutions, Video Formats, IPTV, and much more.
For further reading about ABR I warmly recommend viewing two presentations, both held at last year’s Streaming Tech conference in Stockholm — an annual event hosted by Eyevinn Technology. Hope to see you there in November!
- Presentation by Will Law — Chief Architect at Akamai, describing the technology to bring down ABR latency: https://youtu.be/hMtqWi_OoOU
- Presentation by Anders Cedronius — Emerging Technologies at Net Insight, describing latency through the whole video distribution chain: https://youtu.be/AwR8VWs1PKA