Performant Applications in the Cloud: Understanding the Effect of Network Latency on Bandwidth
The CIO says: “Please migrate the application to the cloud. The cloud is fast, it scales, and it’s highly available.”
The analyst says: “It’s moved, and it works great!”
The end-user says: “Ugh, why is this sooooo slow!”
Why do some companies have remarkable success, while others are stuck troubleshooting or playing the blame game?
Quite often, it’s network latency. And it’s not your network’s fault. Whether you are migrating to the cloud or building applications, not considering network latency in your infrastructure or software architecture can be the demise of your cool solution.
“So what!? I have a high-speed connection. It’s wicked fast.”
Sorry to burst your bubble. Bandwidth is not speed, and speed is not bandwidth. Bandwidth is best described as how much data can be processed at a given time. Think of bandwidth as lanes on a highway — it allows for more cars to travel asynchronously. Speed, you may recall from physics class, is distance ÷ time, so it is more a measure of how fast data travels down the wire (and on our highway, it is 100 km/h). Keeping on our highway analogy, adding more lanes might give you more room to drive 100 km/h, but those extra lanes will not allow you to go faster than 100 km/h. There is a relationship here, but let’s not confuse the two as being the same thing.
“Ok, so what is network latency, and why should I care?”
Another term for network latency is “round-trip-time” and it’s measured in milliseconds. It is the time it takes for the source to send a packet to its destination, and that destination host to respond back to the source. It’s all about the time it takes to successfully deliver packets. Commonly this information can be captured via the ping command. Below is an example pinging Google’s DNS service in Mountain View (California) from my home PC in Edmonton (Alberta), resulting in a round-trip-time of 21ms (latency):
Before we can get into what this could mean for your application, let’s do a quick review of TCP/IP’s two most common packet types: TCP and UDP.
TCP and UDP Packet Type Review
TCP: is a connection-oriented protocol for guaranteed packet delivery (think: reliable, with overhead). Think of this like sending a registered letter, you need to confirm the recipient received the letter, so you ask for confirmation of receipt. Also recall that if you want to send a 10MB file, it will need to be broken up into multiple packets (fixed segments of data). This is why TCP is reliable, because every packet requires delivery confirmation, and any missing packet will be resent (we can’t miss 1MB out of our 10MB file, or it would be corrupt). From a server sending its packet to that same server receiving acknowledgement from the client, this is considered the round-trip-time. TCP is often used for client/server applications when data delivery must be absolute. Each TCP stream is as follows:
TCP Stream: Packet Sent, Await Response, Receive Response, Packet Sent, Await Response, Receive Response, Packet Sent, etc.…
UDP: is a connection-less protocol, meaning the client sends packets and does not require a receipt (think: unreliable, without overhead). In short, this is fire and forget, so round-trip-trip time is irrelevant. Packets are sent with zero expectation of delivery, so the server will send packets and not wait for any response. One common use case is streaming audio/video — like in your virtual meetings. We don’t want those missing packets re-sent, because time has moved on. This is why we sometimes experience splotches, artifacts, or short-term freezing in our video calls. For UDP, latency is much less of an issue because there is no wait time for the sender, and it is often experienced as a delay or reduced quality. Each UDP stream is as follows:
UDP Stream: Packet Sent, Packet Sent, Packet Sent, etc.…
How Latency Effects Network Speed When Using TCP
Ok, buckle up. We’re going to get super technical now. But the takeaway here is that high round-trip-time will significantly reduce how much data you will transmit per second, essentially reducing your usable bandwidth per TCP stream. [Note: the numbers below were calculated using TCP Throughput Calculator.]
Let’s assume we experience 1ms latency (round-trip-time) on a 1Gbps connection using a Windows file copy (which uses 1 TCP stream). This results in a theoretical throughput of 1703.94Mbps:
Now, let’s assume we experience 25ms latency on our 1Gbps connection using a Windows file copy. We now have a theoretical max throughput of 68.16Mbps:
Taking it one step further, let’s assume we experience 85ms latency on a 1Gbps connection using a Windows file copy results in a theoretical throughput of 20.05Mbps:
Putting this into context of user experience, these are the theoretical times it would take for a Windows based copy-paste operation (Data Transfer Calculator):
“My brain hurts, what is happening here?”
One way to help conceptualize this is the sending of a 1 MB file with a packet size of 64K. This 1 MB file would be broken into 16 packets. If we send one packet at a time with a 1ms delay between packets:
Packet, wait 1ms, packet, wait 1ms, packet, wait 1ms, etc.
One could conclude that the time it will take is 16 (packets) × 1ms = 16ms.
But, if there is an 85ms delay between packets, this looks quite a bit different:
Packet, wait 85ms, packet, wait 85ms, packet, wait 85ms, etc.
You could conclude that the time it will take is 16 (packets) × 85ms = 1,360ms.
“Well, that escalated quickly.”
Indeed it did. Latency literally increases the wait time between packets as your source host awaits confirmation of receipt, slowing down your overall data transfer.
Solutions
- Using multiple TCP Streams
We’ve established that latency reduces usable bandwidth per TCP stream. However, many modern applications can use multiple streams to increase aggregated bandwidth. Web browsers will send multiple GET requests, one for each object on a web page, rather than one GET request for an entire page. File downloads can also allow for multiple streams, sometimes from multiple sources (i.e., Torrents). For example, if one TCP stream is limited to 20Mbps, using 4 TCP streams will aggregate to 80Mbps, and 50 streams will take advantage of the entire 1Gpbs connection! Think asynchronous vs. synchronous. Below is a graphical representation of multiple streams using available bandwidth.
Many legacy applications are not able to throttle how many TCP streams they can use, and are stuck to using only one. For example, when using Windows File Copy (copy/paste in Windows Explorer), 1 stream is all there is. However, using cloud savvy applications (such as AWS CLI for file copy) can significantly increase throughput.
2. Proximity of Client and Server
Data center placement matters. To reduce the effects of latency on expected performance, data centers should be placed closest to the user. When considering web applications, content delivery networks (CDNs) can spread out your static objects to there is less reliance on your front end server.
For client/server applications that cannot take advantage of multiple streams, the only answer is ensuring that latency is minimized. For example, latency between AWS’s US-West-2 (Oregon) and Canada-Central (Montreal) regions presents a latency cost of approximately 60ms. Note that the latency between those regions is considered acceptable by conventional standards (<100ms), but not necessarily by user experience standards (and why this post was written). Live AWS latency numbers can be accessed here; where Azure’s posted numbers are here.
Conclusion
Ignoring network latency in your architecture can lead to frustrated users, analysts, and leadership. Traditional high-data-transfer client/server applications are often at the brunt of it, and why many lift-and-shift projects yield underwhelming results. Some companies are fortunate to have locally accessible cloud providers (ahem… Vancouver, Toronto, Montreal, and Quebec City), but those stuck in between are yearning for more (I’m eagerly for AWS Calgary to launch).
Not all applications are created equal, so remember that what performs on your LAN may not necessarily perform when hosted in the cloud. For those of you who are developing new applications, steer away from traditional monolithic applications and consider modern cloud practices. Ensure your stack can use asynchronous data transfer to reduce the impact of latency, wherever possible.
Ultimately, remember that the most important standard is the user experience standard, as our users judge what is performant, and what is not.
I hope you found this post insightful — let me know if you did!
Connect with me at: https://ca.linkedin.com/in/cstogowski