How to save money with Kubernetes in AWS

Are we taking advantage of these tools?

Published in

Jeff Tech

4 min readMar 30, 2020

Kubernetes and AWS are the most common scenarios in the area of Cloud Computing and Distributed Systems. However, we’re probably not as efficient as we could be. We can always be better! Here is a tip to improve our infrastructure.

Previous state

I work at JEFF, a company that has developed a super app of daily services. When I started work here there were around 30 employees, but now, only 2 years later, we have around 700 employees. This is crazy!

At a certain point we found ourselves with an infrastructure with a lot of EC2 instances. Our billing was increasing and we wanted to stop that billing rise.

What did we do?

We did several EC2 Reservations.

What are EC2 Reservations?

If you know that you will use X number of instances in 1 year or 3 year period, you can reserve those instances with a discount. In terms of 1 year you can save around 30–40%. It depends on whether you pay upfront or not.

We did an EC2 Reserve and our coverage (percentage of reserved instances) was around 90%. Our on-demand EC2 instances were only 10%. Cool, we cut the bills!

The problem

This approach is not bad at all. But, if you are in continuous growth or you can’t predict how many compute resources you will need, it’s a problem. For example, we never got close to a good percentage of coverage. Furthermore, depending on your reserve you will marriage with EC2 instance type during 1 or 3 years!

Ligth

AWS has Spot Instances. Wait, what is a “Spot Instance”?

Imagine a screenshot of an AWS Server. AWS does not rent the entire server, you can only rent a portion of the server. Free spaces appear because of this partition and you can rent these spaces.

The save amount using spot instances is close to 70% vs on-demand!. In addition you don’t need to estimate how many instances you will need! So, you will save 70% vs 3X% and this discount is applied in all your instances!

Wait Wait Wait

Where is the trap? Well, it has some consequences. If AWS need this “Free Space” they will destroy your innocent Spot Instance mercilessly.

The solution

We had a HA architecture through kubernetes and stateless microservices, What does it mean? I did another post a year and a half ago (From Docker to self-healing infrastructure) where kubernetes is explained in some of these concepts. Basically our architecture is designed for the failure, it means that if a failure appears, people who are using our services will continue to use it without any impact.

In other words, if AWS destroys an instance of our cluster, nothing wrong happens! Spot Instances are made for us!

How we did?

We use Kops to provision our clusters. With it, we can define our compute node pool as Spot Instance but, how do we handle a node termination?

There is a project called kube-spot-termination-notice-handler. Thanks to this when a Spot Instance is flagged as termination-state (AWS gives you 5 minutes until termination) all applications inside the node are drained, this means that kubernetes must re-allocate those applications.

On the other hand, this git project can also detach from Autoscaling Group the instance which will be terminated. So this Autoscaling Group will create a new Spot Instance to maintain the number of nodes that it needs.

I hope you find it useful,

Jeff SRE Team

Do you want to work with us as SRE? Click here

Do you want to work in JEFF? Click here