Kubernetes Resources: Requests and Limits
After a lot of effort, you have an operational kubernetes cluster up and running, be it a development or a production one. Or you are merely a developer who is feeling great because you just had your app deployed in a kubernetes cluster successfully. In both cases, the next inevitable question you’ll be haunted by is, “how is my application performing”? While the answer to this question might lead to a range of discussions across several areas, most fundamental of them would be the compute resources, which are primarily cpu
and memory
.
Regardless of the application running in a cluster deployed on-premise or in the cloud, whether the relevant container
can be scheduled or not essentially comes down to available resources in the corresponding node(s). Moreover, even an operational application and subsequent container
may become subject to poor performance or complete jeopardy when the load on the resources keeps building. Hence, it’s of paramount importance to be in control of the resources your application may consume. That’s primarily when requests and limits come to the picture.
What options do I have?
By now you might have already figured out, there are two specs ( requests
and limits
) that can be set for each resource type ~ cpu
and memory
.
As the word suggests, requests
is the minimum baseline requested by a container in a pod to be scheduled inside a given node. Whereas limits
refers to a hard cap, beyond which relevant pod can be subject to throttling or even eviction.
Other than cluster level policy definition, expected behaviour after resource allocation is determined by Quality of Service or QoS classes. Without digging deep into the details, it can loosely be generalised with the below chart.
Understandably, QoS class should correspond to how critical a particular app is in regards to an overall ecosystem in the cluster. A later section will cover a few use cases for this.
Another vital point to understand while setting the resources is units of cpu
and memory
. In kubernetes, CPU is denoted by CPU unit, where 1 cpu unit refers to 1 AWS virtual CPU or 1 Hyperthread on a bare-metal Intel processor with Hyperthreading. But in this world of microservices, a fraction of cpu is allowed to be allocated for a container. For instance, an expression like 0.2
which is equivalent to 200m
(i.e. 200 millicores) is valid. Provisioning memory is measured in bytes with the flexibility of expressing both as a plain or a fixed-point number followed by a suffix like M
, K
or Mi
, Ki
etc. This is how a typical (Guaranteed) resources snippet would look like.
A Fairfax Journey
The Kubernetes journey at Fairfax Media has been one of continuous evolution. We transitioned from no resources at all to resourcifying all apps within the cluster. There were a few challenges that led us to make this deliberate shift.
- Noisy neighbour causing unintended behaviour of other apps in an environment where same node resources are shared across many pods.
- Poor stacking as scheduler was missing any definitive reference point as to what
resources
to allocate for a given application. - Poor optimisation of node resources that caused
pod
queueing to be scheduled inside an overused node, while nothing got scheduled in a relatively under-utilised one.
Use Cases
To have some control on this wild voyage, the key strength for us has been observability, where Prometheus and Grafana have been our prime metrics and monitoring tools. Though each Fairfax application is unique in its function, we’ve been able to classify them based on where they sit in the overall ecosystem.
For example, following the Prometheus exported metrics in Grafana dashboard, we looked at the 30 days worth of cpu
and memory
consumptions of one of our critical apps.
Due to the critical nature of this particular application, it needed to be prioritised in resource allocation over others. So we settled for Guaranteed QoS class and requests
and limits
values were set to those similar to the below.
application:
resources:
requests:
cpu: 150m
memory: 350Mi
limits:
cpu: 150m
memory: 350Mi
Here is a different context, where another application is relatively less dependent on a given one and resource consumption varied widely based on inbound traffic.
Noticeably, per-pod cpu usage ranges from approximately 100 millicores to 500 millicores and memory usage 250Mi
to 1Gi
. We determined to consider this app under Burstable QoS class and settled for the following resources
values.
application:
resources:
requests:
cpu: 100m
memory: 250Mi
limits:
cpu: 500m
memory: 1Gi
Trade-offs
Please note, in both cases, there had been few tradeoffs:
- We started with being very generous in Guaranteed QoS because critical apps cannot be compromised under any circumstances such as node resource deficiency.
- In the first use case, the thresholds were determined way too high ( i.e.
cpu:150m
,memory:350Mi
) compared to actual resources consumption observed. - For the Burstable ones,
requests
were set to as low as manageable so that on the occasion of over-stretched node resource, relevant pods don’t wait in the queue to be scheduled. - Moreover, while determining
limits
, fewcpu
spikes (>500m
)have been discounted, as the idea behind havingresources
allocated is to reach to an evenly distributed load scenario.
Takeaways
If we are to look back and reflect, now we are more protected against application failure due to a sudden memory leak or CPU bleeding. Node resources are more optimised because scheduling is more efficient. It has significantly reduced the deployment time that is apparent in some of our more complex applications. Above all, pods are now getting their fair share of resources, which also helps in cost optimisation at a high-level.
The most valuable learning so far has been, performance and/or load wise, that not all the applications work the same way all the time. So even after arriving at this stage, there are apps that require continuous tuning and re-tuning of resources
based on how it plays out in reality at any given point in time. This part has been particularly easy for us since, at Fairfax, we abide by
You build it, you own it.
What’s Next?
We’ve started using admission controller in the cluster to safeguard it against the deployment of either existing apps or the new ones without having the resources
set. Currently, we are setting ourselves up for Horizontal Pod Autoscaling or HPA to ensure the resources are optimised and scaled in a bottom-up manner (i.e. pod → node) instead of a top-down one.