Are Kubernetes CPU limits bad?

Vitor Falcao
Inside SumUp
Published in
4 min readApr 19, 2022

Defining or not CPU limits for your pod is a controversial discussion. Let's talk a little bit about this.

Defining resources requests and limits is not easy for complex applications, it requires a lot of testing and monitoring. But, sometimes we may make educated guesses and iterate over them from time to time, which is closer to reality constraints.

The issue is that most people don't fully understand the inner workings of Kubernetes requests and limits, and this is fine, we don't have time to be proficient in every little piece of Kubernetes and other tools. That's why I am writing this blog post, which will focus on CPU resources and I'll leave the memory to another time.

Why they are saying CPU limits are bad

Let's start with an example where we have two pods, A and B. Pod A is always low on CPU, it uses from 100m to 200m, while B is an application constantly receiving requests and keeps its CPU usage medium to high, from 300m to 500m. Let's also suppose our node has 1000m available for pods.

This is fine because the worst-case scenario is 700m (A 200m + B 500m), we have 300m to spare. The issue is that during any unpredicted event, things could go wrong. Let's consider some examples:

  1. A and B don't have any requests or limits. If pod B starts spending lots of CPU time, like 980m, then A is going to starve with only 20m.
  2. We set up CPU limits for B at 500m, now A won't starve because B has limits. Great, but if A is consuming 100m, and B is consuming 500m then we have 400m unused even though B could be using it during these unpredicted events.
  3. Instead of setting up limits for B, let's set up requests for A. Now A requests 200m for CPU and B has no limits. When B starts spending lots of CPU, A won't be affected because it is guaranteed it's always going to have 200m for him.

The second example shows how setting limits could waste resources that could easily be used. The third example is not yet ideal, we should set requests for both pods, but the idea is to show that limits could waste your resources, and if you set proper requests for every pod then no one is going to starve, so you don't need them.

Always set CPU requests. This is the baseline and it is the only thing you can count on.

- Tim Hockin, Kubernetes Maintainer

One misconception about CPU requests is that they won't guarantee the resources, but according to the documentation:

Pods are guaranteed to get the amount of CPU they request, they may or may not get additional CPU time (depending on the other jobs running).

How to define the CPU requests

I believe that every cluster will have sparing CPU cycles and that the probability that every pod will spike at the same time is low. With this in mind, we can make better decisions.

This is a trade-off, it’s about the cost versus performance. If you set it too high it’s going to be expensive but your pod will always have a lot of resources to use. If you go too low it’s going to be cheap, but you won’t meet your SLOs. What about going average? It's going to be cheap but you could easily go out of your SLOs.

You should always set your requests as this is the only thing your pod will count on. If your pod needs extra cycles to perform as expected then your requests are too low. If you set it too high by considering the worst-case then it's going to be expensive and even without limits, you are still wasting resources.

When to use CPU limits

If you are wondering when you should use limits then, you should take a look at Natan's tweet. They are your friend during stress tests.

You should set CPU requests equal to CPU limits, then run your stress tests. The idea is to find the worst-case scenario and optimize your cost vs performance trade-off as you wish.

Limits and throttling

Removing CPU limits helps us to avoid resources waste, but during my research, I found another issue with limits. It's related to Linux CFS, kernel bugs, and Kubernetes CFS quota.

This is not the main focus of this post so I'll leave interesting links for you to decide if this is still a matter of your scenario or not. I am also recommending you to read some interesting discussions on the subject, like this one. Check the references below for all links.

--

--