Long Fat Network (LFN) and TCP

Nic Kong
4 min readMar 17, 2019

I am not a networking specialist but somehow clients show a great concern of the connectivity. Comms quality will never be good as a private LAN. Packet loss and delay always come with a leased line, even you are eager to pay high. Why? Let’s see what leased line is.

A leased line is an apparent point-to-point ethernet leased line that uses pseudo-wire encapsulation for transporting Ethernet traffic over a tunnel across an MPLS core network.

It may be hard to understand all the words. When you see the word “tunnel”, it tells not to bother how the connectivity is made. But the truth is the connectivity is available when you need it and the same applies to other customers using the same MPLS core network. The problem will be getting serious if the end-to-end are located geographically distant perhaps thousands of miles away from the ends, as the more network devices are interconnected and the longer physical wire results in poorer propagation of the signal.

The term, long fat network (LFN), which are “long” in terms of distance and network delay and “fat” in terms of link bandwidth. The bandwidth here is not referring to the bandwidth you paid for the leased line, rather it is defined as the product of a data link’s capacity and its round-trip delay time. The result, the amount of data, is the maximum amount of data on the network that has been transmitted but not yet acknowledged. The acknowledgment (ACK) is the key feature in TCP to provide reliable transmission of data. Without acknowledging the sender, a packet is counted as lost. The link, therefore, cannot achieve the optimum throughput that the comms vendor quoted to you, and there might have a negative subsequent effect to the transmitted and pending packets. Back to the substantial bandwidth, it is limited by 3 factors:

1.TCP Receive Window

where RWIN is the TCP Receive Window and RTT is the round-trip time

RWIN is the amount of data that a receiver can accept without acknowledging the sender. If the sender has not received acknowledgment for the first packet is sent, it will stop and wait and if this wait exceeds a certain limit, it may even retransmit. The window size by default buffers of up to 65,535 bytes, which was adequate for links with small RTTs. The sending side should also allocate the same amount of memory as the receive side for good performance. To know the value of RTT of the link, the easiest way is to call the command `ping`, ping one endpoint from another, and it shows you the averages.

Actions for tuning:

The TCP window scale option is used to solve the bandwidth utilization issue caused by insufficient window size, which is limited to 65,535 bytes without scaling. For example, if the receive window is 65,535 bytes and the window scale factor is 3, RWIN is as follows: 65535 * 2 ³ = 524,280 bytes. The maximum value of 14 shifts the 16 bits to the left by 14, increasing the window size by 2 ¹⁴.

2. Packet Loss

where MSS is the maximum segment size and Ploss is the probability of packet loss

Actions for tuning:

I could say none for this as MSS, RTT, and Ploss are uncontrollable variables. However, something could be done on the application level. I will reveal in the next post.

3. Congestion Control

For each TCP connection, it maintains a congestion window, limiting the total number of unacknowledged packets that may be in transit end-to-end. It starts with a window, a small multiple of the MSS in size. Although the initial rate is low, the rate of increase is very rapid; for every packet acknowledged the congestion window doubles for every RTT. The transmission rate will be increased by the slow-start algorithm until either a loss is detected. If a loss event occurs, TCP assumes that it is due to network congestion and takes steps to reduce the offered load on the network. These measurements depend on the exact TCP congestion avoidance algorithm used.

Actions for tuning:

a. Initial congestion window

By default, the congestion window starts at 1. Let say you set it to 8, that means a sender is ready to transmit 8 packets without acknowledgments. i.e. 3 rounds of RTT is skipped at the beginning. This fast pick-up approach helps diminish the transmission delay caused in slow start phase. The short-lived TCP connection would be most benefited from this. For example, the HTTP connection is originally designed for exchanging a single request and response; therefore, most data is transmitting in the slow start phase. Google did a test on the effectiveness of the initial congestion window over its servers, and the improvement of packet latency is quite obvious. See https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36640.pdf

If we extend this approach in the data delivery in the financial market, it would benefit the latency of the bid/ask messages in the first second of the market open. The overall latency figures over the day may not be that promising, so no one bothers to have a try I guess.

b. Congestion avoidance algorithm

Different OSs, as well as different versions, have its default algorithm. From what I know, there is no the only one fits all. Each algorithm is invented for a particular purpose. Some are sensitive to packet loss, some are sensitive to latency, while some would be designed for general use. So you should investigate the characteristics of your link and look for an algorithm that would overcome the weakness.

--

--

Nic Kong

A software engineer developing high performance and robust financial applications