Kubernetes — Cost optimisation and savings on AWS

Aaron Pejakovic
ELMO Software
Published in
7 min readDec 18, 2023

Around 4 years ago, the ELMO Infrastructure team began the Kubernetes journey which involved building out multiple production clusters across multiple AWS regions, across multiple AWS accounts. Since then we have been able to migrate almost all our applications into Kubernetes from various different places such as Amazon ECS, AWS Opsworks and datacenters. One of the biggest challenges we faced, and i’m sure everyone has faced, is ensuring that we didn’t blow out the AWS bill with our Kubernetes costs. The idea is to have the cheapest but highest performing cluster possible… it’s important to not compromise performance for cost.

There are multiple different concepts to consider when it comes to the cost optimisation of a Kubernetes cluster. In this article, we will cover the 4 main ones that we used at ELMO to bring our costs down and save up to 60%-70% on our Kubernetes costs. These are:

  • Autoscaling (Down and Up)
  • AWS Spot Instances
  • Sleepy Time
  • Right-sizing cluster and applications

The benefits of Kubernetes being an open-source CNCF project means there is a huge amount of open-source tools in the community that can help us manage our cluster and it’s cost. Below is a some of the tools that we used to bring our costs down:

Autoscaling

Autoscaling is the process of scaling the cluster or application up and down to meet current demands. Why is autoscaling important for cost savings?… because it means at any given time we are running the correct amount of pods for our current application traffic. This means during peak we have a larger amount of pods to handle the higher traffic and then in off peak hours we are able to scale down to a minimal number of pods. This is where Karpenter then comes into action.

Karpeneter is a component that automatically adjusts the size of a Kubernetes cluster based on the current scheduling needs of that cluster. As our application traffic starts to drop and our pods start to scale down, Karpenter will recognise this and also start to scale back our EC2 instances and therefore reduce our spend. See below graph to see it in action:

Total instances over time

You can see the number of instances drop during low traffic night time hours and then the number of instances increase during high peak hours. This is all due to the work of HPA scaling the applications up and down and then Karpenter responding to this and scaling our EC2 instances.

Sleepy Time

All development and infrastructure teams stop working at some point in the day. This means development and staging environments are essentially running idle and not being used. So what we can do is put all our applications to sleep (scale replicas to 0) in the hours that we are not actively using our development environments.

We have been able to achieve ‘sleepy time’ @ ELMO using Kube Downscaler. Once it is deployed into the cluster it can be utilised in many different ways:

Deployment annotation — Only scale this specific deployment:

kubectl annotate deploy nginx 'downscaler/uptime=Mon-Fri 09:00-17:00 Australia/Sydney'

Namespace annotation — Scale everything in this namespace:

apiVersion: v1
kind: Namespace
metadata:
name: foo
labels:
name: test
annotations:
downscaler/uptime: Mon-Sun 07:30-18:00 AEST

Environment Variable — Scale everything in the cluster:

DEFAULT_UPTIME="Mon-Sun 07:30-18:00 Australia/Sydney"

To see cost savings the Kube Downscaler needs to be used in conjunction with the Karpenter mentioned in the previous section. The reason for this is the downscaler will only set the replicas of the workloads to 0, this will then mean that there will be a number of under-utilised nodes in the cluster. It is then Karpenter’s job to calculate which nodes can be terminated and removed from the cluster. Below is a grafana graph of total instances, we can see the dips in instances at night times and on the weekend:

Total instances over time

Based on the above image we can do a rough calculation on how much we save… all instances are AWS m5.xlarge instances and we are on average scaling down 10 instances for a 8 hour period a day.

m5.xlarge (spot) per hour = 0.07c

0.07*10 instances*8 hours*30 days = $170 a month savings approx

AWS Spot Instances

A Spot Instance is an AWS EC2 instance that uses spare EC2 capacity that is available for less than the On-Demand price. A spot instance can be removed by AWS at any given time and is perfect for applications that are stateless and fault-tolerant. AWS spot can save around 60–70% on your ec2 bill. For example the on-demand cost for an m5.xlarge is 0.19c and the spot cost is around 0.07c.

Using AWS spot instances in staging and dev environments are perfect and is almost a must for cost savings. We usually don’t care if there is some disruption in our development environments.

You can also run spot instances in production environments, however you need to take some more precautions when running in production. Below are some of the things that you should have in place before running spot in production:

  • AWS Node Termination Handler — handles the graceful draining of nodes when the 2 minute spot warning is sent.
  • Multiple replicas utilising topologyspreadconstraints — Running with multiple replicas is important and using topologyspreadconstraints to spread those replicas across nodes. This is to ensure when a spot node is taken away by AWS your application can continue to run.
  • On-demand fall back — If there is no spot capacity then fall back to on-demand EC2 instances.
Total spot instances

Above is a graph of the total spot instances we run in our staging Kubernetes cluster at ELMO. On average we are running 22 m5.xlarge spot instances. The price if this per month is $1100 on spot and if we were to run this on the on-demand instances the price would be $3000. As you can see it is a huge cost benefit to run using AWS

Right-sizing cluster and applications

It is very easy to make the mistake of using the largest AWS EC2 instances and also assigning very large resource requests to our applications because we think this will yield the best application performance. However, this is not actually the case.. we want to ensure we have monitoring tools so that we can continually track our applications CPU and Memory usage so that we can correctly adjust our requests accordingly. It is also important to have the correct monitoring tools to ensure there isn’t resource wastage due to EC2 instance sizes.

The first tool that we use at ELMO to assist us in our application right sizing is: kube-prometheus-stack. The kube prometheus stack deploys Prometheus, Grafana, Alertmanager and Thanos which is the industry standard for kubernetes monitoring. It comes with a large number of out-of-the-box dashboards for us to monitor our application usage.See below:

Under-utilised pod

Above is a perfect example of a pod that has been assigned 1.5 CPU but is using around 0.03 on average. Based on this information we could set the requests to a smaller value which.

The second tool we use heavily at ELMO is a tool called Kubecost. Kubecost provides real-time cost visibility and insights for teams using Kubernetes, helping you continuously reduce your cloud costs. Kubecost provides a recommendation tool which will look at usage data of your applications over a certain period and then give you right sizing recommendations.

Summary

While I have only listed a few activities above, it is a good starting place to get your cost optimisation journey on the right track. There is a huge amount of resources in the open source community and its important you seek these out and find what works best for yourself or your organisation. At ELMO we are continually looking for new tech or methods to maintain and drive our costs even further down.

--

--