On cost optimization in Kubernetes

anthony bushong
Google Cloud - Community
5 min readJul 7, 2023
Containers, containers, and more containers — by Guillaume Bolduc on Unsplash

Our team at Google has published the State of Kubernetes Cost Optimization report, a quantitative analysis of cluster performance in cost optimization across a large-scale set of anonymized Kubernetes clusters. Download the report to read our methodology and key findings in full.

Kubernetes is not magic.

I’ve worked with teams to run Kubernetes in production for the better part of six years now, and one thing is clear to me:

The decisions Kubernetes makes around scheduling, autoscaling, eviction, and more, are only as good as the resource requests you set for your workloads.

# A Kubernetes YAML file truncated to show container resource requests

...
resources:
requests:
cpu: 100m
memory: 1Gi
...

Ensuring requests are set for workloads that have even a minimum level of reliability is the genesis of any cost optimization journey in Kubernetes.

Let’s talk about why.

In our inaugural State of Kubernetes Cost Optimization report, our team measured the performance of clusters against four “golden signals” specific to Kubernetes cost optimization.

The four “golden signals” of Kubernetes cost optimization

But in order to actually measure your own performance against these signals, the first and most important thing we recommend is that you set requests for your workloads.

Three out of the four golden signals — workload rightsizing, demand based downscaling, and cluster bin packing — directly depend on or are more accurate when requests are set for your workloads.

In our analysis, the most surprising observation was this: there were way more workloads with no requests set than we expected.

Why was this troubling to us?

If all you care about is cost, setting no requests might look attractive on the surface. Even the smallest cluster with only a few nodes could successfully schedule dozens of Pods that have no requests set.

But of course, you care about more than just cost. What good is optimizing a Kubernetes cluster if the workloads that run on it are unreliable?

Kelsey Hightower famously made an analogy comparing Kubernetes to Tetris, a classic video game where you aim to sort blocks grouped together in different shapes.

In this analogy, the blocks in Tetris are likened to workloads admitted to a cluster, so let’s run with that. If the blocks are workloads, then the shapes of these blocks are the requests you set for workloads in Kubernetes.

Which begs the question: how can you play Tetris if your blocks have no shapes? Can you?

Running workloads with no requests is like playing Tetris with no blocks — by Ravi Palwe on Unsplash

The shapes of these blocks give players information that they can use to play Tetris. In the same way, setting requests for workloads gives Kubernetes information it uses to run them appropriately.

I mentioned how setting appropriate resource requests helps you and Kubernetes optimize for the “golden signals” of Kubernetes cost optimization. But just as important, setting them signals to Kubernetes that your workloads require a higher baseline of reliability.

As you may know, how you set requests for containers in a Kubernetes workload confers upon that workload a Quality of Service (QoS) class, which has direct reliability implications for that workload.

The different QoS classes in Kubernetes

When you don’t have any requests set, workloads get assigned the BestEffort QoS class. As Pods fill up a Node, resources like memory can get scarce, putting the Node under pressure. In times of node-pressure, BestEffort Pods are very likely the first to be killed, either through eviction from the kubelet or via the Linux oom_killer.

Burstable Pods that set requests below limits for memory but consume far more memory than the request are also often marked to be killed in times of node-pressure.

This can cause a disruption of service for end users interacting with these Pods. It can also be a nightmare to debug or perform root cause analysis.

More statuses, more problems in times of node-pressure in Kubernetes

All this because requests were not set!

Do you know how many workloads in your production clusters are unknowingly leaving requests unset?

If not, we at Google have a couple of tools that can quickly help you identify them in the short term and work with their owners to set requests:

  • For users in GKE, an out-of-the-box dashboard template to identify workloads at risk in BestEffort and Burstable QoS classes
  • For anyone running Kubernetes anywhere, a simple CLI to list containers not setting requests for CPU, Memory, or both

Long term, the answer to this may be two-fold.

One, building guardrails into your platform such as admission controllers where policies around requests can be codified — an example being one where folks have to explicitly opt-in to run a workload with no requests.

Two, building a culture across platform owners and application developers where these ramifications are understood and avoided well before production.

The world of containers if we all set requests — by Timelab on Unsplash

I will concede — not all the problems of the world are solved once you set requests.

Recommendations for requests that come from Vertical Pod Autoscaling are not available for all workloads. You might need additional observability to help make requests more accurate. Performing workload rightsizing is a continuous task.

But making sure you have requests set is the best start you can have in the journey to running cost optimized clusters without compromising workload reliability.

Make sure to discuss these concepts with your teams and check out these resources on cost optimization in GKE:

  • A solution guide on best practices for running cost-optimized on GKE
  • A solution guide on rightsizing your workloads at scale in GKE
  • A demo video on rightsizing your workloads at scale in GKE
  • A demo video on using the Google Cloud console for GKE Optimization
  • An interactive tutorial to get set up in GKE with sample workloads

To read more about our methodology, key findings, and recommendations, download the full State of Kubernetes Cost Optimization report.

A special thank you to authors Fernando Rubbo, Kent Hua, Billy Lamberti, Melissa Kendall, and Li Pan for their inspiration in writing this blog.

--

--

anthony bushong
Google Cloud - Community

developer relations engineer @ google. mostly high calorie info snacks about k8s and cloud