Improving SRT Retransmissions — Experiments with Simulated Live Streaming (Part 1)

New retransmission algorithm shows promising results.

Published in

Innovation Labs Blog

15 min readMar 14, 2022

ABSTRACT: This article is the first in a series that will cover technical aspects of the SRT retransmission algorithms. It describes the Automatic Repeat reQuest (ARQ) mechanism in the SRT protocol, and provides test results for a new efficient algorithm that significantly reduces retransmission overhead, improving live streaming over congested networks.

This article was written in co-authorship with Maxim Sharabayko, PhD.

1. Introduction

Secure Reliable Transport (SRT) is a transport protocol for ultra low (sub-second) latency live video and audio streaming¹, as well as for generic bulk data transfer. SRT is available as an open-source technology with the code on GitHub and a published Internet Draft. There is a growing community of SRT users, and new features are being added on a regular basis.

In the case of live streaming, the SRT protocol maintains a constant end-to-end latency. This allows the live stream’s signal characteristics to be recreated on the receiver side, reducing the need for buffering. As packets are streamed from source to destination, SRT detects and adapts to real-time network conditions between the two endpoints. It helps compensate for jitter and bandwidth fluctuations due to congestion over noisy networks. SRT offers various error recovery mechanisms for minimizing the packet loss that is typical of Internet connections, of which Automatic Repeat reQuest (ARQ) is the primary method². ARQ is well-suited to streaming video over IP networks as it requires less bandwidth than other error correction methods. With ARQ, when a receiver detects that a packet is missing it sends an alert to the sender requesting retransmission of this missing packet.

As of version v1.4.2 the SRT library provides an option³ to choose between two retransmission algorithms:

an aggressive retransmission algorithm (default until SRT v1.4.4), and
a new efficient retransmission algorithm (introduced in SRT v1.4.2; default since SRT v1.4.4).

In this series of articles we will evaluate the two algorithms, and provide guidelines on how to select the appropriate algorithm for a given use case. In Part 1 we will provide a brief introduction to ARQ and the SRT retransmission algorithms, along with results of lab tests performed using simulated live streaming and network impairments. In Part 2 we will discuss tests conducted with real video streaming and explore the impact of retransmission algorithms on the viewer’s quality of experience.

2. Understanding Automatic Repeat reQuest (ARQ) in SRT

To enable ARQ for data transmission, an SRT sender stores all sent data packets in its buffer, and maintains a list of lost packets.

The SRT sender assigns a sequential Packet Sequence Number to every data packet to be sent. In the case of live streaming, each data packet also receives a timestamp that is used to determine whether a packet can be delivered to the upstream application in time (in which case a retransmission makes sense), or if it is already too late to deliver the packet and it can be dropped to avoid blocking the ongoing transmission.

Dropping Packets

The Too-Late Packet Drop (TLPKTDROP) mechanism allows both sender and receiver to drop any packets that have no chance to be delivered to an upstream application in time (within the specified end-to-end latency). If the TLPKTDROP mechanism determines it is too late to recover a missing packet, it is no longer retransmitted by the sender, and may also be dropped by the receiver. The receiver skips missing packets and proceeds with delivering more recently received ones to the upstream application. Refer to the “Timestamp-Based Packet Delivery” section of the Internet Draft for more details.

Loss Detection

The SRT receiver periodically sends acknowledgment packets (ACKs) that provide information about the data packets it has received. Based on these ACKs, the sender can remove acknowledged packets from its buffer, and delete the associated sequence numbers (if present) from the loss list. Once these packets are removed, their retransmission is no longer possible and presumably not needed.

As a continuous stream of data packets arrives, the SRT receiver detects gaps in the packet sequence numbers. Such a gap indicates that one or several packets were lost during transmission (for simplicity, we ignore the case of possible packet reordering in this discussion). A loss-triggered negative acknowledgement (NAK) is sent immediately by the receiver to notify the sender about such packet losses.

The SRT receiver keeps a record of all lost packets and stores their sequence numbers in a loss list. This information is required by the receiver to regularly send loss reports (called “periodic NAK reports”) in order to notify the sender about the persistent absence of those lost packets.

Periodic NAK Reports

A periodic NAK report contains a list of all the packets considered lost by the SRT receiver at the time of sending the report. Periodic NAK reports are control packets sent with a period of (RTT + 4xRTTVar) / 2, with a minimum value of 20 ms, where RTT is an exponentially weighted moving average of the round-trip time samples observed at the receiver side (or “smoothed” RTT), and RTTVar is the variance in RTT samples. Refer to the “Round-Trip Time Estimation” section in the Internet Draft for more details.

When a data packet reaches an SRT receiver its reception can be acknowledged by sending an acknowledgement packet (ACK) to the sender. Obviously, the ACK cannot arrive at the sender earlier than the round-trip time of the path (plus a possible acknowledgement delay from the moment the packet had been sent by the SRT sender). In the case of live streaming, periodic NAK reports are sent more often (at least twice per RTT) on purpose. This is done to recover from potential loss of a NAK (or of a previous periodic NAK report), which would result in an increased retransmission delay (the time required for a sender to detect packet loss and schedule a retransmission). Operating at sub-second latencies makes every millisecond important, therefore increasing the probability of a NAK report reaching the sender increases the chances of loss recovery within end-to-end latency constraints.

The Periodic NAK reports mechanism is turned on by default for the live streaming configuration of SRT.

Packet Retransmission

Upon reception of a negative acknowledgement (either via a loss-triggered NAK or a periodic NAK report) the SRT sender stores the sequence number(s) of the lost packet(s) in the sender’s loss list. It prioritizes retransmission of lost data packets over those to be transmitted for the first time. The pace at which packets are sent is based upon the configured bandwidth limit. The retransmission of a lost packet can be repeated multiple times until the SRT receiver acknowledges its receipt, or until one of the peers decides to drop this packet. If the SRT receiver decides to drop a packet, it issues a regular acknowledgement to prevent the sender from retransmitting. Conversely, if the SRT sender drops a packet it notifies the receiver by sending a Message Drop Request control packet.

3. Retransmission Algorithms

3.1. Aggressive Retransmission Algorithm

The original aggressive retransmission algorithm causes the SRT sender to schedule a packet for retransmission each time it receives a negative acknowledgement (NAK).

On a network characterized by low packet loss levels and link capacity high enough to accommodate extra retransmission overhead, this algorithm increases the probability of recovering from packet loss with a minimum delay, and may better suit strict end-to-end latency constraints.

In a live configuration, the goal of the SRT receiver with periodic NAK reports enabled is to increase the chances of notifying the sender about packet loss. Given a NAK packet itself can be lost during transmission, periodic NAK reports are sent at least twice per RTT of the path. This increases the probability of loss recovery and helps to detect packet loss at the sender side as soon as possible. But such an intensive packet retransmission produces an overhead of at least two times the packet loss level in the network.

If link capacity is limited or there is congestion on a path, extra retransmission overhead might be an issue. Excessive retransmission would likely exacerbate any congestion, leading to a situation where regular data packets (including those waiting in the sender’s queue to be transmitted for the first time) would be prevented from reaching their destination. This should be taken into account when configuring SRT so that there is enough bandwidth for it to operate properly, especially if there is competing traffic on a path.

3.2. Efficient Retransmission Algorithm

The new efficient retransmission algorithm optimizes bandwidth usage by producing fewer retransmissions per lost packet. It takes SRT statistics into account to determine if a retransmitted packet is still in flight and could reach the receiver in time, so that some of the NAK reports are ignored by the sender.

With the efficient algorithm, as with its aggressive counterpart, the first retransmission attempt is triggered by the first NAK received by the sender. The SRT sender then ignores all subsequent NAK reports received for a particular packet within a period equal to RTT — 4xRTTVar (calculated from the time of the very first retransmission). This remains true for all ensuing retransmission attempts for the same packet.

The value of 4xRTTVar is subtracted from RTT to address possible decreases in round-trip time during transmission. The algorithm favors retransmitting a packet over not doing so and potentially losing time while waiting for the next periodic NAK report.

The SRT sender is aware that at least approximately half of the round-trip time is required for a retransmitted packet to reach the receiver. At least another RTT/2 approximately is needed for an ACK packet sent back by the receiver to reach the sender. Thus it is assumed that any loss report received at the sender side for an already retransmitted packet within the round-trip time frame is a loss report from the “past state” of the SRT receiver (meaning a loss report sent before the time a retransmitted packet in flight could reach the receiver).

It’s worth noting that the changes were made at the sender side only. The receiver logic remains the same to maintain backward compatibility between different versions of the SRT protocol, and to support receivers at SRT versions prior to v1.4.2.

4. Testing Retransmission Algorithms on Simulated Streams

4.1. Test Setup

Datasets were collected using the laboratory test setup shown in Fig. 1, where Flip (sender host) and Flop (receiver host) are CentOS 7 based machines with the SRT test application installed. Detailed technical characteristics of the machines can be found in Table 2 in Appendix A. A LANforge CT910 network emulator was used to introduce network impairments.

The srt-xtransmit test application on the sender side (A) generated a dummy payload and sent it over an SRT connection through the LANforge emulator to the receiver side (B). The generated constant bitrate (CBR) stream was chosen to be either 5 Mbps or 10 Mbps.

The link capacity was limited to 45 Mbps, while the round-trip time (RTT) was set to 20 ms. The packet loss ratios applied were 5% and 10%. These values were chosen to be high enough to stress test the algorithms. The bidirectional packet loss mode of LANforge was used to emulate packets being lost in both directions (from A to B and from B to A). This was done to account for the worst case scenario where not only the data but also control packets containing meaningful protocol information (like packets lost and acknowledged) can be lost.

The SRT version under evaluation in these test results was v1.4.2. Since this version both retransmission algorithms are available in the SRT library and could be configured with the SRTO_RETRANSMITALGO socket option. Periodic NAK reports were enabled for both algorithms during testing (the SRTO_NAKREPORT socket option was set to “true” by default).

A default value of 1 Gbps was used as the maximum allowed bandwidth which limits the bandwidth usage by SRT (see the SRTO_MAXBW socket option description). The values of sender and receiver buffers were set to 1 Gbit on both sender and receiver (see the SRTO_SNDBUF and SRTO_RCVBUF socket option descriptions). These settings were applied in order not to affect test results.

The datasets with statistics were collected with lib-srt-utils scripts for the following range of SRT latencies: 0.5xRTT, 1xRTT, 1.5xRTT, …, 4.5xRTT, where xRTT means “times RTT”. Thus, with RTT set to 20 ms, the latency range was 10 ms, 20 ms, 30 ms, …, 90 ms.

As the test was performed in the laboratory with simulated streams and in a controlled network environment, the network conditions were not expected to change. Therefore the length of each experiment was reduced to 2 minutes and the number of attempts was equal to one per combination of parameters.

4.2. Test Results

Figure 2a illustrates the percentage of retransmission overhead and unrecovered packets depending on SRT latency and retransmission algorithm used when packet loss equals 5% and sending rate is 5 Mbps. Figure 2b is a zoomed view of unrecovered packets when SRT latency is greater than or equal to 1.5xRTT. The blue line indicates results using the aggressive algorithm (when SRTO_RETRANSMITALGO=0). The orange line indicates results using the efficient algorithm (when SRTO_RETRANSMITALGO=1). The remaining graphs for each combination of link capacity, RTT, packet loss, and sending rate under consideration are shown in Appendix B. Link capacity is equal to 45 Mbps and RTT is equal to 20 ms for all the graphs. Data tables can be found in Appendix C.

For each experiment, the metrics are calculated as follows:

Retransmission overhead is measured as a percentage of retransmitted data packets (pktRetransTotal) with respect to original data packets (pktSentUniqueTotal) sent by the SRT sender during the experiment.
“Percentage of unrecovered packets” is a percentage of those dropped by the SRT receiver and, as a result, not delivered to the upstream application (pktRcvDropTotal) with respect to original data packets (pktSentUniqueTotal) sent by the SRT sender during the experiment.

Fig. 2a: Percentage of retransmission overhead and unrecovered packets for packet loss 5% and sending rate 5 Mbps.

Fig. 2b: Zoomed view of unrecovered packets for packet loss 5% and sending rate 5 Mbps.

From the graphs and data, we can conclude the following:

For all the parameters, there is a significant difference in the number of retransmitted packets between the two algorithms. On average, the aggressive algorithm (SRTO_RETRANSMITALGO=0, blue line) produces 1.71 times more retransmissions than the efficient one (SRTO_RETRANSMITALGO=1, orange line). To be more precise, there are 1.77 times more retransmissions in the case of 5% packet loss (9.66% vs 5.45% retransmitted packets) and 1.65 times more retransmissions in the case of 10% packet loss (19.18% vs 11.61% retransmitted packets). See Table 7 in Appendix C for details. Note: When calculating the average, the data points corresponding to the SRT latencies equal to 0.5xRTT, 1xRTT, and 1.5xRTT were excluded as outliers.
The percentage of unrecovered packets shows a decreasing trend for both algorithms as the SRT latency increases. The level of unrecovered packets is fairly high for the first data points (when SRT latency is equal to 0.5xRTT and 1xRTT), which means that such a small latency value isn’t enough for either algorithm to recover lost packets. However, the percentage of unrecovered packets rapidly decreases from the point where SRT latency equals 1xRTT to where SRT latency is equal to 1.5xRTT. This suggests that configuring such a latency is possible, though the recommended SRT latency for general setup remains 3–4 times RTT.
On average, the number of unrecovered packets is a bit higher for the efficient algorithm, as expected by design. However, in terms of absolute values, the difference is negligible. For example, given an SRT latency of 4xRTT with 5% packet loss only 0.0066% of SRT packets are unrecovered, and only 0.0540% with 10% packet loss. The difference is larger for 1.5xRTT, 2xRTT, and 2.5xRTT latencies in comparison with observations where SRT latency is greater than or equal to 3xRTT. This is also expected as the aggressive algorithm tries to resend a packet each time the sender gets a loss report, while the efficient algorithm might not have enough time to retransmit if a previously retransmitted packet (with the same sequence number) was lost. See Tables 8, 9 in Appendix C for details.

It is important to note that in this test case rather high values for packet loss were applied to stress test the algorithms. Also, the packet loss was applied in both directions so that not only data packets might be lost (among them are retransmissions), but also control packets with meaningful information such as acknowledged and lost packet numbers. This means that both algorithms will show better performance in the case of lower or unidirectional packet loss⁴. The higher a chance there is to lose a packet, the larger the latency value that should be used for SRT streaming (not necessarily limited to 4 or 4.5 times RTT as in these tests).

Note that the maximum allowed bandwidth which limits the bandwidth usage by the SRT protocol was set to 1 Gbps and the sizes of the receiver and sender buffers were set to 1 Gbit in these tests. Thus the number of attempts to retransmit a lost packet was not limited in any way for the aggressive algorithm and, as a result, we observe higher levels of retransmissions and lower levels of dropped packets in this case. As the link capacity was configured high enough (45 Mbps) for sending rates under consideration (5 Mbps and 10 Mbps), excessive retransmissions did not cause any congestion and increased the chances of lost packets recovery. This might not be the case in real networks where channel bandwidth fluctuates over time.

5. Conclusions

The first round of testing on simulated streams has confirmed the initial assumptions:

The new efficient algorithm produces significantly lower retransmission overhead, and therefore uses available bandwidth more efficiently.
The difference in number of unrecovered packets between the two algorithms is insignificant.
In comparison with the efficient retransmission algorithm, the aggressive one provides a slightly higher probability of recovering a loss at the expense of requiring a higher operational bandwidth.

Another series of tests was performed with real video streaming using an encoder/decoder pair as SRT sender and receiver, respectively. It has been subjectively confirmed that the observed insignificant increase in the percentage of unrecovered packets for the new efficient algorithm does not impact the viewer’s quality of experience. However, the detailed results are to be published and discussed separately. That will be the subject of Part 2 of this series of articles.

To summarize, the efficient algorithm better fits networks with characteristics that are difficult to predict or simply unknown, such as networks with high packet loss level, or which have limited and highly variable capacities over short time scales (e.g. wireless, cellular, etc.). The aggressive algorithm is best suited for high quality networks, or networks where low levels of packet loss and higher link capacity can accommodate extra retransmission overhead.

The following table provides a brief comparison of retransmission algorithms and their metrics.

Table 1: A brief comparison of retransmission algorithms.

Acknowledgments

The authors would like to thank steve_matthews for his assistance in reviewing and editing the article for clarity.

Appendix A. Technical Characteristics of the Test Machines

Table 2: Technical characteristics of the test machines.

Appendix B. Graphs

Fig. 3a: Percentage of retransmission overhead and unrecovered packets for packet loss 5% and sending rate 10 Mbps.

Fig. 3b: Zoomed view of unrecovered packets for packet loss 5% and sending rate 10 Mbps.

Fig. 4a: Percentage of retransmission overhead and unrecovered packets for packet loss 10% and sending rate 5 Mbps.

Fig. 4b: Zoomed view of unrecovered packets for packet loss 10% and sending rate 5 Mbps.

Fig. 5a: Percentage of retransmission overhead and unrecovered packets for packet loss 10% and sending rate 10 Mbps.

Fig. 5b: Zoomed view of unrecovered packets for packet loss 10% and sending rate 10 Mbps.

Appendix C. Tables with Test Results

Table 3: Percentage of retransmission overhead for the aggressive retransmission algorithm (`SRTO_RETRANSMITALGO=0`).

Table 4: Percentage of retransmission overhead for the efficient retransmission algorithm (SRTO_RETRANSMITALGO=1).

Table 5: Percentage of unrecovered packets for the aggressive retransmission algorithm (SRTO_RETRANSMITALGO=0).

Table 6: Percentage of unrecovered packets for the efficient retransmission algorithm (SRTO_RETRANSMITALGO=1).

Table 7: Percentage of retransmission overhead for the aggressive algorithm (SRTO_RETRANSMITALGO=0) divided by percentage of retransmission overhead for the efficient algorithm (SRTO_RETRANSMITALGO=1).

Table 8: The difference in percentage of unrecovered packets between efficient (SRTO_RETRANSMITALGO=1) and aggressive (SRTO_RETRANSMITALGO=0) retransmission algorithms.

Table 9: The difference in percentage of unrecovered packets between efficient (SRTO_RETRANSMITALGO=1) and aggressive (SRTO_RETRANSMITALGO=0) retransmission algorithms averaged for packet loss.

[1] The term “live streaming” refers to MPEG-TS style continuous data transmission with latency management. Live streaming based on segmentation and transmission of files like in HTTP Live Streaming (HLS) protocol as described in RFC8216 is not part of this use case.

[2] Forward Error Correction (FEC) and Path Redundancy are also used, but are out of scope of the tests described in this article.

[3] See SRTO_RETRANSMITALGO socket option for more details.

[4] This has been confirmed by various studies performed in the laboratory during evaluation of both algorithms. For the sake of brevity, only the results of testing with bidirectional packet loss are presented.