Kubernetes Resources: Requests and Limits

Published in

Nine Publishing’s Product & Technology Team

6 min readDec 18, 2018

After a lot of effort, you have an operational kubernetes cluster up and running, be it a development or a production one. Or you are merely a developer who is feeling great because you just had your app deployed in a kubernetes cluster successfully. In both cases, the next inevitable question you’ll be haunted by is, “how is my application performing”? While the answer to this question might lead to a range of discussions across several areas, most fundamental of them would be the compute resources, which are primarily cpu and memory.

Regardless of the application running in a cluster deployed on-premise or in the cloud, whether the relevant container can be scheduled or not essentially comes down to available resources in the corresponding node(s). Moreover, even an operational application and subsequent container may become subject to poor performance or complete jeopardy when the load on the resources keeps building. Hence, it’s of paramount importance to be in control of the resources your application may consume. That’s primarily when requests and limits come to the picture.

What options do I have?

By now you might have already figured out, there are two specs ( requests and limits) that can be set for each resource type ~ cpu and memory.

As the word suggests, requests is the minimum baseline requested by a container in a pod to be scheduled inside a given node. Whereas limits refers to a hard cap, beyond which relevant pod can be subject to throttling or even eviction.

Other than cluster level policy definition, expected behaviour after resource allocation is determined by Quality of Service or QoS classes. Without digging deep into the details, it can loosely be generalised with the below chart.

QoS in relation to resource specs in kubernetes

Understandably, QoS class should correspond to how critical a particular app is in regards to an overall ecosystem in the cluster. A later section will cover a few use cases for this.

Another vital point to understand while setting the resources is units of cpu and memory. In kubernetes, CPU is denoted by CPU unit, where 1 cpu unit refers to 1 AWS virtual CPU or 1 Hyperthread on a bare-metal Intel processor with Hyperthreading. But in this world of microservices, a fraction of cpu is allowed to be allocated for a container. For instance, an expression like 0.2 which is equivalent to 200m (i.e. 200 millicores) is valid. Provisioning memory is measured in bytes with the flexibility of expressing both as a plain or a fixed-point number followed by a suffix like M, K or Mi, Ki etc. This is how a typical (Guaranteed) resources snippet would look like.

Example resource definition in kubernetes

A Fairfax Journey

The Kubernetes journey at Fairfax Media has been one of continuous evolution. We transitioned from no resources at all to resourcifying all apps within the cluster. There were a few challenges that led us to make this deliberate shift.

Noisy neighbour causing unintended behaviour of other apps in an environment where same node resources are shared across many pods.
Poor stacking as scheduler was missing any definitive reference point as to what resources to allocate for a given application.
Poor optimisation of node resources that caused pod queueing to be scheduled inside an overused node, while nothing got scheduled in a relatively under-utilised one.

Use Cases

To have some control on this wild voyage, the key strength for us has been observability, where Prometheus and Grafana have been our prime metrics and monitoring tools. Though each Fairfax application is unique in its function, we’ve been able to classify them based on where they sit in the overall ecosystem.

For example, following the Prometheus exported metrics in Grafana dashboard, we looked at the 30 days worth of cpu and memory consumptions of one of our critical apps.

Due to the critical nature of this particular application, it needed to be prioritised in resource allocation over others. So we settled for Guaranteed QoS class and requests and limits values were set to those similar to the below.

application:
  resources:
    requests:
      cpu: 150m
      memory: 350Mi
    limits:
      cpu: 150m
      memory: 350Mi

Here is a different context, where another application is relatively less dependent on a given one and resource consumption varied widely based on inbound traffic.

Noticeably, per-pod cpu usage ranges from approximately 100 millicores to 500 millicores and memory usage 250Mito 1Gi. We determined to consider this app under Burstable QoS class and settled for the following resources values.

application:
  resources:
    requests:
      cpu: 100m
      memory: 250Mi
    limits:
      cpu: 500m
      memory: 1Gi

Trade-offs

Please note, in both cases, there had been few tradeoffs:

We started with being very generous in Guaranteed QoS because critical apps cannot be compromised under any circumstances such as node resource deficiency.
In the first use case, the thresholds were determined way too high ( i.e. cpu:150m, memory:350Mi ) compared to actual resources consumption observed.
For the Burstable ones, requests were set to as low as manageable so that on the occasion of over-stretched node resource, relevant pods don’t wait in the queue to be scheduled.
Moreover, while determining limits, few cpu spikes (> 500m)have been discounted, as the idea behind having resources allocated is to reach to an evenly distributed load scenario.

Takeaways

If we are to look back and reflect, now we are more protected against application failure due to a sudden memory leak or CPU bleeding. Node resources are more optimised because scheduling is more efficient. It has significantly reduced the deployment time that is apparent in some of our more complex applications. Above all, pods are now getting their fair share of resources, which also helps in cost optimisation at a high-level.

The most valuable learning so far has been, performance and/or load wise, that not all the applications work the same way all the time. So even after arriving at this stage, there are apps that require continuous tuning and re-tuning of resources based on how it plays out in reality at any given point in time. This part has been particularly easy for us since, at Fairfax, we abide by

You build it, you own it.

What’s Next?

We’ve started using admission controller in the cluster to safeguard it against the deployment of either existing apps or the new ones without having the resources set. Currently, we are setting ourselves up for Horizontal Pod Autoscaling or HPA to ensure the resources are optimised and scaled in a bottom-up manner (i.e. pod → node) instead of a top-down one.