A quick approach for cost monitoring on Kubernetes

How to get cost insights about the projects running on Kubernetes without introducing any extra components

Lorenzo Arribas
7 min readJan 17, 2021

--

This article presents an approach to approximate the costs a specific project/team generates on Kubernetes, based on the ratio of resources allocated to them vs. the total resources in the cluster.

Our Kubernetes cluster is deployed using AWS EKS, and we use DataDog as a monitoring solution. The screenshots and pieces of code that illustrate this article are based on those technologies, but the general principles and formulas apply to any other stack.

The problem of cost visibility on Kubernetes

We recently migrated all of Glovo’s Machine Learning services to Kubernetes. Before that, each of our services used to run in a dedicated cluster of EC2 instances (cloud VMs).

One of the results we expected to achieve with this transition was a significant cost reduction. With Kubernetes, all our projects (big and small) would be able to share the same underlying infrastructure and thus leave less room for idle resources.

One cluster per project and one replica per machine vs. A shared cluster where replicas of multiple projects share the same machines

Before the transition to Kubernetes, each team could go to the AWS Cost Explorer and see how much money each of their projects was spending. Naturally, we wanted to maintain that level of visibility.

The challenge with Kubernetes is that, because different pods share a pool of resources, attributing a part of the total cost to a specific project (or team, or environment, …) is not trivial. At the moment, the built-in budgeting tools of most cloud platforms I know (AWS in our case) do not provide a straightforward way to explore these costs either, so we were left with two alternatives:

  1. Install a 3rd-party component to monitor costs (we considered Kubecost and CloudZero)
  2. Approximate the cost based on the metrics we were already processing

We briefly considered option (1), but decided to hold off on adopting a solution that would require extra maintenance and/or procurement.

What follows is our implementation of option (2): A low-hanging-fruit approach to cost monitoring on…

--

--

Lorenzo Arribas

Staff Software Engineer at Glovo. I specialize in event-driven architectures and Machine Learning Operations. My website is https://larribas.me