Kubernetes Container Resource Requirements — Part 2: CPU

CPU Requests, Limits, Guaranteed or Burstable?

The Meaning of CPU

So, what exactly does CPU mean in K8s speak? A CPU is equivalent to exactly one of the CPUs presented by a node’s operating system, regardless of whether this presented CPU maps to a physical core, a hyper-thread of a physical core, or an EC2 vCPU (which in itself is a hyper-thread of a physical core).


In contrast to memory, CPU is compressible in K8s parlance, meaning it can be throttled. When you specify the CPU limits of a container, you’re actually limiting its CPU time across all of the node’s CPUs, not setting the affinity of the container to a specific CPU or set of CPUs. This means that even if you specify a limit less than the total number of CPUs on a node, your container will still see (and use) all of the node’s CPUs — it’s just the time that’s limited.

  • limits 4: cpu-quota=400000
  • limits 0.5: cpu-quota=50000


Per memory, you specify how many CPUs (or fractions of CPUs) you require for your component by specifying its requests — this is taken into consideration when scheduling your component and K8s will not allow a node to be oversubscribed in terms of CPU (cumulative container requests cannot exceed the number of presented CPUs on that worker node). You are guaranteed this number of equivalent CPUs, but what happens when the worker node is under excessive load, for example in the case of 100% CPU utilisation and/or excessive load averages? Here, the CPU scheduling priority of a container is determined by the requests configuration value, which is multiplied by 1024 and passed to Docker as the cpu-shares option. This is purely a weight; if all containers running on the node have an equal weight, then all containers will have equal CPU scheduling priority in the case of excessive load (when there are no spare CPU cycles). If one container has a higher weight than the others, it will have a greater CPU scheduling priority and effectively get more CPU time than the others under excessive load.

Guaranteed or Burstable?

Per part 1, a burstable QoS configuration means additional CPU time can be opportunistically scavenged by your container, provided it’s not being used elsewhere. This allows potentially more efficient use of underlying resources at the cost of greater unpredictability, for example a CPU-bound component’s latency may be inexplicably affected by transient co-location of other containers on the same worker node — noisy neighbours. In a fairly worst-case scenario, enough burstable containers on a given worker node cause excessive load, spoiling it for everyone.

How Many CPUs?

How many CPUs should you allocate for your container? Unfortunately there’s no one-size-fits-all answer here and it’s dependent on the characteristics of your component, acceptable performance, pod placement strategies, cloud instance types, cost and so on. Sorry.

cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat

Wrapping Up

You’re hopefully now aware of container resource requirements, QoS classes and what memory and CPU mean in the context of K8s and Docker. The key takeaway here is to ensure you understand the resource utilisation characteristics of your component and configure its requests and limits appropriately such that you’re making the best use of cluster resources and allowing it to work in harmony with other components.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store