Scaling at Scale: I was told there would be no math

Published in

USA TODAY NETWORK

4 min readApr 9, 2018

Over the course of last year, the USA TODAY NETWORK Platform Engineering group moved much of our cloud footprint into Docker containers, scheduled in Kubernetes clusters. Today, we run more than 3,000 containers in production — front-end and back-end web applications, internal tools and services and more. This is part of an ongoing series of lessons learned about running production Kubernetes at scale.

Kubernetes and Docker are, at their core, tools we use to do two things: simplify the management of our systems and improve those systems’ efficiency, especially cost-efficiency. One major way we’ve been able to run more efficiently is when we have to scale.

Changing the scaling paradigm

In the past, much of our infrastructure was mainly running on Amazon VMs configured with Chef. To scale, we’d do some basic math: (A) find some threshold where my application or my application server starts to break, (B) find out how long it takes to spin up and configure a new VM and (C) subtract (B) from (A) and scale out resources before stuff breaks.

Because we now run a relatively large-scale Kubernetes system, we are able to provision a giant pool of resources for our applications to use as scaling actions are needed. Because we’re scaling container images (and not re-provisioning servers), and because we’re scaling into a pool of resources we already have (and now provisioning new VMs), Kubernetes has dramatically decreased the amount of time it takes to scale.

So the lesson is that Kubernetes massively improved how efficiently we scale. In the past, we had to be 10 minutes ahead of any potential issue with resource constraints. Today, we can afford to work only minutes, or even seconds ahead, which means less overprovisioning and less money wasted on resources we don’t need.

So, case closed, right?

The truth about abstraction layers

Kubernetes and Docker are abstraction layers. We often use abstraction layers as crutches and assume that we have papered over a difficult-to-manage system and magically made it more user-friendly. A lot of engineers don’t want to admit that that’s a fallacy. Sure, we can interact on a daily basis with an easier and friendlier system, but using abstraction layers is a double-edged sword. Abstractions don’t absolve you from having to learn, understand and manage the systems that still live underneath.

This case is no different, because we haven’t fixed our scaling issues by moving to Kubernetes. We now have the aforementioned giant pool of resources to scale containers up and down, but what happens when we have a massive scaling event? We’re covered for the daily needs of 10 more vCPUs or 50 more GB of RAM here and there, but what about when we need 100 more vCPUs or when load is out of control? What if we need more resources than our pool has?

That scenario makes it obvious why we can’t forget about the systems lying underneath Kubernetes, and we also can’t forget our old scaling math. So can we solve this by scaling our Kubernetes workers the same way we previously scaled VMs?

Unfortunately, we can’t — the math is actually more complicated now than it was before. As mentioned above, we used to scale based on resource usage, but Kubernetes also introduces a concept of resource limits. Each of our Kubernetes pods is provisioned with those two numbers; my containers are using some amount of resources, X, but we’ve set limits to tell those containers they could use up to some limit, Y.

Initially, we scaled only on X — just like with VMs, you need more resources, you get more — but all of our pods started failing. Why?

Solving the new math

Kubernetes has solved many of our autoscaling problems but it has also created a unique new one. Sure, we have a pool of available resources that our containers can scale into, but because of this concept of limits, in some ways we can consider those resources to be spoken-for.

You may be asking why our limits are different from our requests? This is because we’re essentially gaming Kubernetes into running leaner. We could solve our problem by setting all of our limits and our requests as the same, but we’re able to run a smaller footprint (and thus spend less money) by setting requests as low as possible, and then having our containers steal resources as needed — banking on the fact that at any given time, overall resource usage generally averages out.

Sometimes, of course, it doesn’t, so we need to adjust our math. Not only should we be scaling our worker pool when we’re using a lot of resources, we should also scale when we’ve asked for a lot of resources. For that reason, the ultimate solution is that we’ve brought back our old math and updated it. Now, instead of monitoring resource usage by VMs, we’re monitoring resource limits set across the aggregation of containers — and when we go above a threshold, we scale the worker pool.

In all, scaling in Kubernetes is like any other abstraction in technology; it’s great that it solves problems, but we always have to keep in mind that it’s bringing new problems to solve as well.

Scaling at Scale: I was told there would be no math

Written by Rob Gindes