Kubernetes Container Resource Requirements — Part 2: CPU
In part 1, we looked at what Kubernetes’ (K8s) requests and limits resource requirements mean, plus the meaning of memory within the Docker container runtime. In this post, we’ll cover the meaning of CPU within K8s, what requests and limits mean in this context, plus look at how to determine an appropriate configuration for your component.
The Meaning of CPU
So, what exactly does CPU mean in K8s speak? A CPU is equivalent to exactly one of the CPUs presented by a node’s operating system, regardless of whether this presented CPU maps to a physical core, a hyper-thread of a physical core, or an EC2 vCPU (which in itself is a hyper-thread of a physical core).
In contrast to memory, CPU is compressible in K8s parlance, meaning it can be throttled. When you specify the CPU limits of a container, you’re actually limiting its CPU time across all of the node’s CPUs, not setting the affinity of the container to a specific CPU or set of CPUs. This means that even if you specify a limit less than the total number of CPUs on a node, your container will still see (and use) all of the node’s CPUs — it’s just the time that’s limited.
For example, specifying a CPU limits of 4 on a node with 8 CPUs means the container will use the equivalent of 4 CPUs, but spread across all CPUs. For a single container running on a dedicated node, the maximum permissible CPU usage across all node CPUs will be 50% in this example.
So, how does this translate to Docker? K8s controls CPU limits by passing the
cpu-period is always set to 100000µs (100ms), and denotes the period in which container CPU utilisation is tracked.
cpu-quota is the total amount of CPU time that a container can use in each
cpu-period. Both settings control the kernel’s Completely Fair Scheduler (CFS). Here’s how different K8s CPU limits translate to Docker configuration:
- limits 1:
- limits 4:
- limits 0.5:
limits 1 means 100% of an equivalent 1 CPU can be used every 100ms, limits 4 means 400% of an equivalent 1 CPU (i.e. equivalent 4 CPUs) can be used every 100ms and so on. Don’t forget, this is spread across all CPUs. Because of the way CFS quotas work, any container exceeding its quota in a given period will not be allowed to run again until the next period — this means you may notice inexplicable pausing, particularly if your component is CPU bound and latency sensitive.
Per memory, you specify how many CPUs (or fractions of CPUs) you require for your component by specifying its requests — this is taken into consideration when scheduling your component and K8s will not allow a node to be oversubscribed in terms of CPU (cumulative container requests cannot exceed the number of presented CPUs on that worker node). You are guaranteed this number of equivalent CPUs, but what happens when the worker node is under excessive load, for example in the case of 100% CPU utilisation and/or excessive load averages? Here, the CPU scheduling priority of a container is determined by the requests configuration value, which is multiplied by 1024 and passed to Docker as the
cpu-shares option. This is purely a weight; if all containers running on the node have an equal weight, then all containers will have equal CPU scheduling priority in the case of excessive load (when there are no spare CPU cycles). If one container has a higher weight than the others, it will have a greater CPU scheduling priority and effectively get more CPU time than the others under excessive load.
Guaranteed or Burstable?
Per part 1, a burstable QoS configuration means additional CPU time can be opportunistically scavenged by your container, provided it’s not being used elsewhere. This allows potentially more efficient use of underlying resources at the cost of greater unpredictability, for example a CPU-bound component’s latency may be inexplicably affected by transient co-location of other containers on the same worker node — noisy neighbours. In a fairly worst-case scenario, enough burstable containers on a given worker node cause excessive load, spoiling it for everyone.
Again, if you’re new to K8s, you’re best off starting ensuring predictability with the Guaranteed QoS class by setting limits the same as requests. As you better understand the resource utilisation characteristics of your stack and perhaps find you’re overprovisioned in terms of CPU, you may consider introducing Burstable containers to get more bang for your buck and potentially even lowering your overall footprint.
How Many CPUs?
How many CPUs should you allocate for your container? Unfortunately there’s no one-size-fits-all answer here and it’s dependent on the characteristics of your component, acceptable performance, pod placement strategies, cloud instance types, cost and so on. Sorry.
But — if you’ve got decent instrumentation and know your component like the back of your hand, you could try out different configurations within a performance test environment, or even canary in production. You’ll probably be searching for the sweet spot that balances cost and performance, but at Hotels.com we’ve found container throttling to be a big factor for our CPU-intensive components running on Linux. We generally use Prometheus to determine useful container throttling statistics, but another geeky way is to jump into the container itself and check out the cgroup CPU accounting stats:
This provides you with the total number of scheduler periods, the total number of times the container was throttled and the cumulative throttle time in nanoseconds.
You’re hopefully now aware of container resource requirements, QoS classes and what memory and CPU mean in the context of K8s and Docker. The key takeaway here is to ensure you understand the resource utilisation characteristics of your component and configure its requests and limits appropriately such that you’re making the best use of cluster resources and allowing it to work in harmony with other components.
Be sure to follow The Hotels.com Technology Blog for more K8s-related goodness!