Kubernetes cost optimisation
GKE as a use case.
In this article I would like to talk about resources over provisioning in kubernetes and some ideas to improve utilisation to save money and have a well architected workload.
If you are starting to go to kubernetes, my first suggestion is to publish new apps with a small resources request/limit, and increase when needed ( never go the other way, because you will not decrease 😉 )
Now let’s talk about an existing kubernetes cluster with a huge workload, and you want to start analysing, where to start? what are the factors? …etc
My rule is: go with the simplest and the one showing a quick effect, then go harder 😊
########################### Pods level ###########################
- Resources request/limit
kubectl top pod
NAME CPU(cores) MEMORY(bytes)
app-6755b7b67d-4jnnj 168m 1620Mi
app-6755b7b67d-pl644 173m 1720Mi
app-6755b7b67d-psqqf 137m 1674Mi
k describe pod app-6755b7b67d-4jnnj
Limits:
memory: 4000Mi
Requests:
cpu: 2
memory: 4000Mi
Compare your actual utilisation with your request config over a period of time, and see, do you really need to request that much? keep in mind kubernetes will reserve what you request, not what you actually use, chose your request carefully, you can use VPA for resources suggestion, or just monitoring over time, and if you are using GKE, now this feature is out of the box
Blue: usage
Green: suggestion
2. HPA min replica
We are always prefer to be safe than sorry, so we go with high min replica
Ex:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
app Deployment/app 12%/50% 6 12 6 300dWe are running with min replica of 6 pods, usage is 12 percent, that’s mean if we go to 3 pods, usage will be 24%, less than 3? also we are fine.
Checking your min replicas is a good point to start.
3. Target threshold
Review the threshold when you decide to scale, and depends on which metrics, keep in mind that you can use custom metrics, it’s not only memory and cpu, and you need to treat every app separately, threshold is not build once run anywhere ( that’s containers 😒 )
########################## Cluster level ##########################
An important point to start with is looking into the amount of resources in your cluster, and it’s important to distinguish between metrics.
You can depends of monitoring to collect this, or if you are using GKE you have cost optimisation and observability features that can provide these data out of the box.
Allocatable: the amount of resources in the cluster
Requested: the amount of resources your workload requested ( summation of all pods requests )
Used: your actual utilisation.
- Allocatable — Requested = the amount of resources in your cluster that wasn’t requested by you, why it’s there?! imagine you have a node of 4 cores, and 32GB of ram, you deployed 3 pods requesting 1 core and 16 GB per pod, that’s mean the first node will take 2 pods and be full of memory, so that you need another node to put the 3rd pod.
first node: 2 cores requested, 32GB requested ( 2 cores allocatable but not requested )
second node: 1 core requested, 16GB requested ( 3 cores and 16GB allocatable but not requested )
wasted resources in terms of requested/allocatable ratio:
cpu: 5/8 = 62%
memory: 16/64 = 25%
You can use gke NAP to allocate nodes depends on your request, so you can maximise requested/allocatable ratio as much as possible.
- Requested — usage = the amount of resources you think you need, but it’s actually wasted ( it’s only solvable by adjusting your requests, VPA may help )
- Allocatable — usage = the full amount of wasted resources ( fixing the previous two ratios will fix this one )
####### Analyse autoscaler events
Almost everyone run kubernetes with autoscaler enabled to add/remove nodes we need, here we gonna talk about the removing part.
Info: Autoscaler move pods from one node to another when there is a possibility to remove a node that running with a low utilisation and there is empty room in other nodes to move pods into.
Given the above, there are cases prevent autoscaler from doing this, and ended up with low utilisation and wasted resources, how to find out these cases? analyse your autoscaler events.
Ex:
- “scale.down.error.failed.to.evict.pods”
- “no.scale.down.node.pod.has.local.storage”
- “no.scale.down.node.pod.not.enough.pdb”
Analyse, check what problems you are suffering from, then it’s easy to solve.
- ******************* Important points to address ***********************
When you run your cluster with a strong resources optimisation, and you don’t have extra resources available at any time, you need to be able to autoscale fast, that include adding new node, pull docker image, and run your application.
If adding a new nodes take long time, you may consider over provisioning to save time, and not necessarily to waste resources here, you may run internal apps, or apps doesn’t mind interruption.
If pulling docker images take long, you can also check GKE streaming or kube-fledged to cache images.
A special use case not needed to everyone: Scaling down GKE clusters during off-peak hours.
Use case: you are running with a minimum number of pods as 90, and that’s the optimal number for you during the day, except from midnight to 4AM, and it’s hard to run with min lower than 90 because at 4AM you face a sudden increase of traffic that overload your pods and increase latency and error rate before your ability to scale up, are you are facing this? this is for you.
Here you can create a cronjob that set min number of replicas to 90 from 4AM to midnight, then to a lower value from midnight to 4AM, this way you can save money without effecting performance.
See you ✌️