Karpenter: why slow is better
Introduction
Karpenter is a brilliant tool for vertical autoscaling your Kubernetes clusters. I love the way it is defined by using simple affinity labels to define how and when to scale up.
There is a but: scaling fast often means scaling too much.
Really?
Yes. Let me explain my case scenario. As a rule, we destroy development and test environments when not required to save on costs (the cloud is expensive!). We bring down the EKS cluster every evening and weekend and we bring it back up in the morning ready to be used by the developers.
A simplified example of our affinity rules looks like
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: karpenter.sh/capacity-type
operator: In
values:
- spotThe problem is we have hundreds of microservices all using similar selectors and they all get deployed at the same time, when the cluster is started up in the morning.
However, Karpenter operates based on the current state of the cluster and the available resources. It does not consider future resource requirements or the fact that some of the microservices may have short-lived workloads. As a result, Karpenter ends up over-provisioning EC2 instances, leading to an inefficient use of resources and increased costs.
Karpenter’s consolidation will eventually address this issue by merging together some deployments but it takes time, and time means money spent we didn’t need to.
Is there a solution?
The most obvious one is don’t use Karpenter in development environments. That’s a given. But we do need it at least in our Staging environment which is production-like and we will need to know whether the changes we make will work in production.
The second option is to bring the resources up in small batches giving Karpenter time to do the calculation required to ensure an optimal number of worker nodes.
Fortunately, we never deploy anything that has not been automated before. That’s why we are able to split up the build process into small parts. We can take advantage of the terraform targeting to start up the applications slowly.
terraform apply -target helm_release.my-app
sleep 300
terraform apply -target helm_release.my-other-appThe above code snippet runs in a CI/CD pipeline schedule for 7 am. As you can see it it simply waits 5 minutes before starting up the second application.
The difference we see is that we’re giving Karpenter enough time to bring up the EC2 instances and we don’t overwhelm it. The example above is very simple, not exactly how we do it but it’s to demonstrate a point.
In reality, we do a few more health checks and we use instead
kubectl wait --timeout=600s \
--for=condition=Ready pod \
-n NAMESPACE -l app=my-appConclusion
Cloud remains very expensive. We are still seeing savings of up to 70% of the cost when customers migrate out into their dedicated data centres. The thing is cloud providers are making it very easy for us to spend money. Karpenter and cluster-autoscaler are brilliant, I love them. The downside is poor management leads to higher costs.
As ever, get in touch if you need help.