What is a congestion control algorithm?
You can imagine an Internet connection like a water pipe with a certain capacity. Every service you connect to is trying to pour some fluid into the pipe. But if too much data is pushed into the pipe, the pipe will start leaking and no one will be happy. Congestion control algorithms are programs at Internet servers that try to slow down data transmission when the pipe gets full.
What did the CMU group find about Google’s new Congestion Control Algorithm, BBR?
In one experiment, we saw a sender using BBR consuming 40% of the link while sharing the pipe with 16 other services each of which were using a traditional algorithm. Each of the other 16 only got less than 4% of the remaining bandwidth. We developed a mathematical model showing that BBR will always take the same fixed share of the pipe, no matter how many other competing services are using it.
Traditional algorithms try to take less and less of the pipe as more services have to share. If there are 10 services, each service should aim to take 10% of the pipe; if there are 4 services each service should aim to take 25%, etc. But BBR’s share is always fixed, no matter how many other services are sharing the pipe, and this causes unfairness.
What does this mean for users?
When you’re in a place with a lot of network congestion, like a coffee shop with lots of users, or in a remote area with limited capacity, the network might not have enough capacity for everyone. In these cases, congestion control algorithms will determine each user’s share. Unfairness between congestion control algorithms can mean high-definition video for some services but standard quality for others, or fast download times for some services but slower ones for others.
Do you think Google is prioritizing their services over others?
Probably not: Google has made the BBR code public so any company can use it. We’ve done some studies that see that a lot of different providers are trying it out. A recent article was just published by researchers at Verizon who tried BBR out on their own services.
Ensuring that different congestion control algorithms are fair to each other is a hard problem, and I’m worried that the kind of problem we saw with BBR is the “tip of the iceberg.” Many companies don’t release the details of their algorithms and may use proprietary approaches we don’t know about. It’s harder to study whether these new approaches are fair to other services because we don’t have their code.
Why didn’t Google figure this out before they deployed BBR?
The state of the art approach to analyzing if new algorithm is fair is simply to try it out and see what happens. Google did publish graphs showing that BBR looked more than fair for the scenarios they tested. Unfortunately, running experiments is not the best way to understand an algorithm comprehensively. You might run one experiment with one user in a 100Mbps network, and then run an experiment with 2 users in a 40Mbps network… every time you want to change the network, or the number of users, etc, you have to run a new experiment to see what happens. You’ll never run all possible combinations of all possible networks.
We studied BBR using mathematical formulas combined with experiments. Mathematical formulas allow us to get a bigger picture of how BBR will react, even for scenarios where we haven’t run an experiment.
Have other researchers studied BBR?
Yes they have! And we’re not the first to be worried about BBR’s fairness. Geoff Husten showed that in datacenters, BBR can actually take the whole link and “starve” out services using other algorithms. The other services will get almost zero share of the link. The Google team tells me that they have fixed this problem in the newest version of BBR, “v2”, which is in alpha release now. I’m hoping that they’ll make a similar change to fix the problem we found as well.
Where can I learn more?