Achieving low latency video streaming

I spent a few hours this weekend testing the preview of end-to-end low latency live streaming in Wowza Streaming Cloud. The preview is an API-only feature inside Wowza Streaming Cloud. The technology behind the low latency feature is the WOWZ protocol, which is a RTMP-like protocol, combined with WebSockets. This should, according to Wowza, deliver sub-three seconds’ latency end-to-end.

I used the Wowza GoCoder mobile app on my iPhone as the live stream source and I got as low as 1.3 seconds end to end latency. The preview was easy to use, and once you understood the REST based API it was easy to setup.

I’m glad that I got the chance to access the preview as low latency streaming is a hot topic these days. Recently, YouTube launched an ultra-low-latency option, that makes it possible to stream video with just a couple seconds of latency.

Low latency is important for live sports streaming, especially if betting is involved. It has become a top priority for many streaming media broadcasters and content providers. Every step of the streaming solution introduces some latency, and you must keep latency in mind at every step when designing your system. In this post, I will focus mainly on the protocols and techniques used for delivery towards the client.

Different ways to reach lower latency

Achieving lower latency is all about getting content quicker to the client, reducing signaling overhead and reducing read-ahead buffering in the client. There are multiple techniques to reach low latency streaming, but most comes with drawbacks that need to be considered when making the choice of solution.

If you reduce the read-ahead buffer in the client you reduce the end to end latency. But this makes you more sensitive to network glitches. With a reduced read-ahead buffer you could end up with playback buffering.

One way to get the content quicker to the client is to reduce the size of the segments in HTTP streaming, such as HLS or DASH. To enable seamless switching between the different quality representations of adaptive streaming it is necessary to have I-frames at the beginning of each segment to be able to switch bitrate between the segments. Consequently, smaller segment sizes lead to a lower encoding efficiency because I-frames need more bits for encoding.

Based on this the recommended segment size for HTTP adaptive streaming is usually around 2 to 4 seconds, which is a good compromise between encoding efficiency and flexibility for stream adaption to bandwidth changes.

One big thief of latency is signaling. TCP is the most commonly used protocol on the Internet. UDP is lighter-weight and thereby lower latency. But there’s no error-checking, no monitoring. UDP will just drop frames if you have a network glitch. There are some initiatives and technologies like QUIC, SRT, with the goal to make UDP more like TCP.

Finally, we have WebSockets that is used by the Wowza low-latency preview mentioned earlier. WebSockets is designed to provide a standardized, two-way, reliable communications channel between a browser and a server. WebSockets is used together with other streaming protocols such as WebRTC, SRT, and Aspera FASP. The disadvantages are a few. Caching is harder and scalability requires different and more expensive infrastructure.

HTTP Adaptive Streaming

Using traditional HTTP Streaming, such as HLS and DASH, delays for live streaming has been in the range of 30–60 seconds. By reducing the segment length and client read-ahead buffers we could get that down to below 10 seconds. To achieve even lower latency using HLS and DASH there are some initiatives that are interesting, all using similar techniques.

Akamai, Harmonic and others have been experimenting with Common Media Application Format (CMAF) chunks and HTTP chunked transfer encoding. A similar approach for HLS has been used by Periscope, Wowza and others.

HTTP Chunked Transfer Encoding

Chunked transfer encoding is a streaming data transfer mechanism available in the HTTP 1.1 and higher. Chunked transfer encoding allows a server to maintain an HTTP persistent connection for dynamically generated content. This can be used in combination with CMAF chunks and could be used also by HLS.

The main benefit of chunked transfer coding in video streaming is that it reduces the segmentation delay. In live streaming, the media is sent in segments that are each a few seconds long. As described above, the segments can’t practically be shorter than 2–4 seconds. By using chunked transfer coding, the client can request the yet-to-be completed segment and begin receiving the segments earlier.

Signaling

As stated earlier, signaling overhead is one big thief of delay. There are multiple initiatives to reduce the TCP signaling overhead. I have listed two that is interesting to keep an eye on for streaming video.

QUIC (Quick UDP Internet Connections) is a transport layer network protocol designed by Google. QUIC aims to be as close as possible to a TCP connection, but with much-reduced latency. QUIC was introduced in the YouTube app July 2016 and within just four months the level of QUIC traffic on mobile networks rose by 200%

SRT and SRT Alliance, founded by Haivision and Wowza, is an open source video transport protocol that enables the delivery of high-quality and secure, low-latency video across the public Internet. SRT is a video streaming transport protocol that for example, handles packet loss recovery through advanced low latency retransmission techniques and network health monitoring between endpoints.

Summary

Achieving low streaming latency comes with a cost. You need to understand those costs before you make the choice of implementation. You need to think of latency in all steps of the solution, encoding, packaging, delivery, choice of player and buffer management. But for live sports streaming and interactive events, this will be worth the effort when it comes to user experience. And there are ways to achieve acceptable latency, balancing the different techniques, without compromising cost and reliability.

There are other technologies that will provide low latency. We have as example Sye from NetInsight and different WebRTC solutions from for examle Peer5, Streamroot andViblast. I will cover those in a separate article.

Magnus Svensson is a Media Solution Consultant and partner at Eyevinn Technology. A Swedish consultancy company specialized in streaming, VR, and gaming.

Follow me on Twitter (@svensson00) for regular updates and news.