It’s lost on most people that today’s internet is running on algorithms and protocols that were designed back in the 80s. You know, back when things like “message boards” where the most exciting thing going on.
But that’s not today’s internet. Today’s internet is distributed across 7 billion people on devices and network connectivity ranging from awesome great 1GB / sec wired connections to wireless ones that… well.. Aren’t.
And that brings up the question that perhaps today’s internet needs require and upgrade to those algorithms we made back 30 years ago.
This is where TCP BBR comes in. It’s a TCP congestion control algorithm built for the congestion of the modern internet.
TCP : Sharing is caring.
TCP tries to balance the need between being fast (transmitting data quickly) and balanced (share the pipe for multiple users), with much more weight on being balanced.
Most TCP implementations use a backoff algorithm that results in you getting about ½ the bandwidth of the pipe (anything greater than that tends to crowd out other people). And the downside is that even if you’re the only person using the pipe, TCP generally under-utilizes your network connection by a significant amount (as much as 50%).
This is the main problem with TCP, it’s focus on balanced network transfer results in wasted bandwidth.
TCP BBR : The important bits
You can read the paper for more details, but the gist is that BBR is a congestion control technology which:
Is designed to respond to actual congestion, rather than packet loss. The BBR team designed the algorithm with the desire to have something that responds to actual congestion, rather than packet loss. BBR models the network to send as fast as the available bandwidth and is 2700x faster than previous TCPs on a 10Gb, 100ms link with 1% loss
Focused on improving network performance when the network isn’t very good. TCP BBR more accurately balances fairness and utilization, resulting in better download speed over the same network. It’s most noticeable in situations where the network is bad (however, it doesn’t hurt you if you’re on a squeaky clean network)
Doesn’t require the client to implement BBR. This one is the magic pixie dust. Prior algorithms like QUIC required client & server to have both implemented the algorithm. BBR doesn’t require the client to be using BBR as well. This is especially relevant in the developing world which use older mobile platforms and have limited bandwidth, or areas where websites & services haven’t made the switch yet.
Taking BBR for a spin
That all sounds fine and dandy, but let’s kick the tires and see how good this technology really is. To test things, I set up two VMs in two different regions, and did a quick Iperf test to check their performance:
[ 4] 0.0–10.0 sec 9.97 GBytes 8.55 Gbits/sec
To simulate the ideal conditions where BBR is useful (high packet loss) we execute the following tc command, which simulates packet loss at a certain percentage.
sudo tc qdisc add dev eth0 root netem loss 1.5%
When the existing VMs connect, we see performance drop significantly (which we’d expect):
[ 3] 0.0–10.3 sec 1.10 GBytes 921 Mbits/sec
Then, we turned on BBR, on the server-side only using the following command:
sysctl -w net.ipv4.tcp_congestion_control=bbr
(note this is only if you’re using a 4.10+ linux kernel, otherwise, you’ll need to do these steps)
We ran the same Iperf test, and
[ 3] 0.0–10.0 sec 2.90 GBytes 2.49 Gbits/sec
We see almost 2x the bandwidth of the raw connection alone just for setting a simple flag in the server.
Where to use TCP BBR?
Here’s the best part, if you’re building your technology on Google Cloud Platform, then you’ve already got BBR running. Google Cloud Platform’s Front-Ends, Load balancers and managed services already have TCP BBR enabled, so any of your clients in bad areas will have already seen a performance improvement.
However, if you’re exposing a VM endpoint directly to a user (say, for a VPN), then you’re going to need to either enable TCP BBR yourself (by setting the flag if your kernel supports BBR, or compiling the upgrade yourself), or put the endpoint behind a GFE interface (for more on this, see my previous tests of HTTP Load balancing).
Now, most of the benefit is going to be seen on client facing endpoints in bad traffic areas. So if you flip this on between your VMs, don’t be alarmed if things aren’t significantly better. That being said, there’s no downside to using it, even in good-connection areas;
My suggestion? Turn it on, tell your users you’ve made massive performance improvements, and continue on go focus on more important things ;)