Continuous Cost Optimization for Kubernetes

Published in

Armory

8 min readAug 10, 2022

Before I came to Armory, I spent several years building products to provide transparency into IT costs. For many businesses IT Costs are a black hole, they make up a significant percentage of a company’s total expenses, but are hard to understand and control. Modern cloud services provide the ability to easily turn capacity off when not needed, which allows companies to optimize their costs in realtime and receive real savings for those optimizations. 10 years ago, if a company shrank an over-provisioned VM, it saved $0 immediately — they were still paying for the physical server hosting the VM, and the datacenter space in which it was located. At best, this optimization avoided buying the next server, and delayed building the next datacenter. With modern cloud services, when unused capacity is shutdown, the cost savings is immediate.

This article shows how you can integrate Kubecost with Armory’s new Continuous Deployments-as-a-Service offering, Project Borealis. It provides an example configuration for GitHub Actions that enables a GitHub Actions workflow to deploy Kubecost cost recommendations using Project Borealis. The Project Borealis configuration will ensure integration tests pass in a staging environment before deploying to production, and leverages a canary strategy during production deployment to ensure that the application is healthy using the new sizing. Together these solutions allow you to safely optimize the footprint of your Kubernetes cluster with every code commit.

In addition to Project Borealis, Armory also provides Armory enterprise. Armory Enterprise, an enterprise version of Spinnaker, can also be configured to deploy Kubecost sizing recommendations.

What is Kubecost

Kubecost provides cost transparency, cost optimization, and cost governance for Kubernetes clusters. It allocates the total costs of a cluster across the applications that cluster is supporting. Application owners can understand and control their portion of the shared Kubernetes costs because of this allocated view.

Kubecost also provides rightsizing recommendations for both the pods within the cluster and the cluster itself. These recommendations allow you to optimize your application and infrastructure footprint — and thus your costs. This blog is going to be leveraging the application right sizing recommendations to optimize the requests of the workload running within a cluster.

What is Project Borealis

Update: June 7, 2022:
Project Borealis is now generally available and called Armory Continuous Deployments as a Service. You can give it a try by signing up for free.

Project Borealis is Armory’s latest continuous deployment offering. It delivers intelligent deployment-as-a-service and supports advanced deployment strategies. By automating code deployment across all environments, Project Borealis removes demands on developers while also reducing risk of service disruptions due to change failures. It does this by seamlessly integrating pre-production verification tasks with advanced deployment strategies in production. This mitigates risk by providing deployment flexibility while limiting blast radius.

A simple command line is available to invoke Project Borealis. This allows any CI Tool or shell script to easily use it. In addition to the CLI a purpose-built GitHub action, and Spinnaker plugin are also available.

This blog provides an example of deploying the new application resource requests to a staging environment, running a set of integration tests within GitHub Actions, and then, if those tests pass, leverage a canary deployment to 2 production clusters, ensuring the production application remains healthy during the resize.

The Detailed Integration

This section is going to walk through details on how these tools work together to deliver continuous cost optimization. Here is a quick demo showing the functionality you can get out of this setup.

In addition to offering continuous deployment, Project Borealis can also simplify cost allocation within Kubecost. Kubecost’s cost allocation feature allows you to allocate the cost of your Kubernetes cluster out be the applications it supports. Implementing this functionality requires you to apply labels or annotations on your application, identifying the application supported by the workload. Some of the Kubecost best practice fields (Application name and Environment name) automatically have annotations injected by Armory Project Borealis. This allows you to more easily allocate costs against these concepts by leveraging Project Borealis.

The GitHub Actions Configuration

To integrate Kubecost and Armory Borealis, you can leverage a GitHub Actions configuration that will:

call this Kubecost API
leverage jq to extract the sizing recommendations from the response
patch them into the application’s Kubernetes manifest.
deploy the updated manifest using Armory Project Borealis

A complete configuration for this can be found here. Let’s discuss how each step works.

Checkout Code

This step checks out my git repository, which contains the Kubernetes manifest that is currently running for our application. We will patch this manifest, and then deploy it in the later steps. We will also use the Project Borealis deployment configuration from Git.

Query Kubecost for Sizing Recommendation

This step performs a rest request to the Kubecost API. It reads the Kubecost hostname from a git secret named ‘KUBECOST_HOST’, which should contain the hostname to your Kubecost instance. You will need to change the hostname in this URL to the name of your Kubecost instance. The request filters to a particular namespace and container in order to retrieve the needed size, you should update the filters to match your application.

Extract CPU and RAM recommendations

These steps take the API Response from Kubecost and leverage jq in order to extract the CPU and RAM size recommendations. JQ is a scripting language for transforming and extracting data from JSON. You should change the application’s container name from it’s value of ‘potato-facts’ to match your application.

Update CPU/RAM Limit/Request

These 4 stages update the cpu and memory requests, and the CPU and memory limits in my Kubernetes manifest to match the recommendations from Kubecost. As given, the script will adjust both requests and limits to the same value. If you are only applying requests or limits, and not both, you can remove the other steps from the script.

Deploy Changes

This step invoked the project Borealis GitHub Action to start my deployment. It reads a set of credentials to use when deploying from the following GHA secrets: BOREALIS_CREDENTIAL_ID and BOREALIS_CREDENTIAL_SECRET. It reads the Project Borealis deployment configuration from the path ‘/deploy-automated-2-env.yml’ within my Git Repo.

The Project Borealis Configuration

Here is the complete Project Borealis configuration used. Here is a description of how its definition is working:

application

This is the name of the application that you are deploying. It ensures only one application is deploying to a given target at a time. Also, the project borealis UI labels jobs by application. Update it to match the name of your application.

targets:

This section describes the application targets to which you deploy.

update these fields for your application:

account — This is the account name you want to update. Update it to match your account names.

namespace — update with the name of the namespace to which your application will deploy.

The rest of the configuration should work as-is. It states:

Staging uses a rolling update strategy, and will run integration tests after deployment.
Production uses a canary strategy, and will only run after staging is deployed and its tests pass.

manifests:

This provides a path to the Kubernetes manifest or manifests that you wish to deploy. Change them to the path in your GitHub repo of your manifests.

strategies.rolling

This defines the strategy that our staging environment is using to deploy. It is effectively a rolling update, immediately sending 100% of traffic to the new application version.

strategies.mycanary

This defines the strategy used when deploying to our production environment. This canary strategy will vary the number of pods running on the new versus old version of the service. The strategy starts by spinning up 25% of the pods on the new version. It then runs an automated canary analysis to ensure CPU and memory are both healthy. Assuming everything is healthy traffic increases to 50%. The strategy again checks the application health before finally sending 100% of traffic to the new version. During the strategy if the application is ever unhealthy, the change will be automatically rolled back.

If your application is running a number of pods that is not divisible by 4, you may wish to update these weights to better match your application. Otherwise Project Borealis will round to the closest whole pod.

The automated canary analysis is running two queries, one that checks CPU Usage, and another that checks RAM usage. The first check happens over 7 seconds, the second runs over 21 seconds. Depending on your risk level you may wish to increase the duration of this automated analysis period. The deployment rolls back if the specified metrics exceed their configured thresholds.

analysis.defaultMetricProviderName

this is the name you have given the metric provide you configured for Armory Project Borealis. Change its value to match the name you set when enabling borealis to use your metric provider.

analysis.queries

These are the queries that will be used for canary analysis during the deployment. These queries are written to work against Prometheus. The queries leverage the default context variable {{armory.replicaSetName}} in order to filter to just the newly deployed version of the application, and should work without modification as long as your prometheus was installed with the following helm chart argument set:

— set “kube-state-metrics.metricAnnotationsAllowList[0]=pods=[*]”.

Depending on how much variability your applications CPU and memory usage have, you may want to adjust the upperLimit. This defines a threshold that if passed will trigger rollback.

webhooks

This defines a webhook that triggers a GitHub Actions workflow. You must configure the following fields:

uriTemplate — It is configured to read the GitHub org and repository to run against from two secrets created in the Armory Project Borealis Secrets UI. You can either add secrets for these, or directly place the correct values in the url.

secrets.github_token — You should create a new secret in the Project Borealis Secrets UI that is named ‘github_token’, and contains a personal access token that can invoke your GitHub workflow. Because this is a secret, you should leverage the secrets manager for it, and not hardcode it into the yaml file.

event_type — The event_type is called out in the bodyTemplate. This event type must match the event type that is declared within your GitHub action itself.

Wrapping your Existing Integration Test GitHub Action

Start by copying your existing integration tests suite, and ensuring that its GitHub Actions workflow is triggered on a repository_dispatch event, and that the event’s name matches the name event_type that you are passing in in your webhook.

At the end or your GitHub Actions workflow, add steps to authenticate against armory cloud, and invoke the callback. If your integration tests pass, call the callback indicating success, if they fail, call it to indicate failure. A simple GitHub Actions workflow that is triggered on a webhook, and calls back with success when triggered can be found here

Next Steps

This blog has shown you how you can leverage Armory Project Borealis and Kubecost in order to continuously and safely optimize your Kubernetes costs. If you’d like to try it, Kubecost offers a community edition. Armory Project Borealis is still in early access, but reach out and we’ll get you access.

Update: June 7, 2022:
Project Borealis is now generally available and called Armory Continuous Deployments as a Service. You can give it a try by signing up for free.

Once you set this up you have a manual GitHub Actions workflow that reads your sizing recommendations from Kubecost, and deploys them using a project Borealis pipeline to ensure your application remains healthy during the resize. Now you need to decide how you will trigger it. GitHub Actions allows you to easily choose whether this workflow will run after every commit, run on a schedule, or be manually triggered when a user clicks a button.

Final Thoughts

If your applications don’t consider their resource needs it can lead to spending more than you need to on Kubernetes. Kubecost provides automated sizing recommendations that can help you right-size your containers. However, modifying your production application introduces risk. This blog has shown you how you can easily leverage Armory borealis to connect your existing automation into a fully automated deployment pipeline, that can deploy Kubecost recommendations. This same pipelines can automate your other application changes, while decreasing the risk of pushing any change to production.