It’s been six months since we started helping teams optimize their Kubernetes infrastructure. During this time, we have started every new client relationship by helping their team gain clear visibility into their resource and spend allocation. Last year, we wrote an early post on using Grafana and Prometheus to accomplish this. Today, we want to share how our process has evolved and present the open source solution we have created to support this improved approach!
First, let’s quickly go over why we always start with resource/cost allocation before helping teams optimize their resources. We do it because 1) it directly uncovers common patterns that create overspending on infrastructure assets, not to mention other undesirable issues within a Kubernetes cluster and 2) it helps teams prioritize where to focus their optimization efforts. The root cause of these negative patterns has ranged from the mundane (abandoned deployments) to the startling (bitcoin mining malware). More on these common pitfalls in a future post!
Our initial approach
We initially used a simple set of Grafana dashboards (mentioned in post above) to provide teams cost allocation visibility. We quickly ran into limitations with these tools. To name a few, these dashboards weren’t easily able to show us cost allocation by service/deployment/label, didn’t integrate with dynamic cloud billing data, and they wouldn’t easily support viewing data across multiple clusters. In response, we created the Kubecost product to give teams real-time dynamic cost allocation for major cloud providers and on-prem clusters. This week, we are publicly launching and open sourcing our core model used to generate this cost allocation data. It’s now available on Github.
Kubecost enables teams to view the following with an install that takes only minutes:
- Real-time cost allocations by all key k8s concepts, e.g. spend namespace, deployment, service, daemonset, pod, container, job, etc.
- Cost allocation by configurable labels to measure spend by owner, team, department, product, etc.
- Dynamic asset pricing enabled by integrations with AWS and GCP billing APIs, estimates available for Azure
- Cost allocation metrics for CPU, GPU, memory, and storage
- Out of cluster cloud costs tied back to owner, e.g. S3 buckets and RDS instance allocated to pod/deployment
- You can also export billing data back to Prometheus for further analysis
Why do we believe it improves standalone Grafana/Prometheus?
Don’t get us wrong — we remain huge fans of Grafana & Prometheus. Kubecost today is built with deep integrations to both. We do believe Kubecost offers an improvement on top of these as a standalone solution for cost monitoring. That’s because it takes you from estimates to real-time, accurate cost data directly from cloud providers, and highly customizable input for on-prem clusters. It also enables meaningful new views such as cost by service (shown in the following graph).
The core Kubecost allocation model is open source (Apache 2) and can now be found on Github. You can deploy it as a pod directly on your cluster if you want to run the model yourself or make modifications. You can also install the full Kubecost product (w/ associated dashboards) via a single Helm install on our website.
Cost allocation is just the beginning for Kubecost — it’s our goal to make capacity planning and cost management tools more accessible to teams of all sizes. We’re starting with this cost allocation model because we believe it is the first step to enabling powerful optimizations and process changes.