Reduce the cost of running AKS cluster by leveraging Azure Spot VMs| 70% and more

Published in

Microsoft Azure

8 min readJul 29, 2022

There are many ways to optimise and cut costs when running your AKS cluster. I would not encourage you to scroll to the bottom right away, but don’t miss the list at the end of this blog post.

In this article, we’ll focus on using Azure Spot Virtual Machines in your AKS cluster. By simply utilising Spot VMs in your AKS architecture, you can realise savings of more than 70%.

What are Spot instances?

Using Azure Spot Virtual Machines allows you to take advantage of Azure's unused capacity at a significant cost saving. At any point when Azure needs the capacity back, the Azure infrastructure will evict Azure Spot Virtual Machines. Therefore, Azure Spot Virtual Machines are great for workloads that can handle interruptions like batch processing jobs, dev/test environments, large compute workloads and more.

The available capacity can vary based on size, region, time of day, and more. When deploying Azure Spot Virtual Machines, Azure will allocate the VMs if there is capacity available, but there is no SLA for these VMs. An Azure Spot Virtual Machine offers no high availability guarantees. At any point when Azure needs the capacity back, the Azure infrastructure will evict Azure Spot Virtual Machines with 30 seconds' notice.

Is my workload able to run on Azure Spot (with interruptions)?

The workloads that are ideally suited to run on Spot VMs:

Batch Jobs
Workloads that can sustain and/or recover from interruptions.
Development and test environments
Stateless application that can leverage Spot to scale out.
Short-lived jobs that can efficiently run again if the VM is evicted.

The good thing is that most projects I have encountered recently follow the microservice design principles and best practices, such as externalising state and designing for fault tolerance, which makes the majority of projects good candidates for utilising spot instances.

If you think your workload is not a good candidate for Spot instances, there’s a big chance that you will have to rethink your architecture and the design principles you have applied. Things fail all the time! If you have built your applications to recover from failures quickly, you have done it, and besides the side effect of not having to worry if your application is up and running, you can switch to Spot VMs and see some significant cost savings.

Essential Azure Spot and Kubernetes Concepts you need to know

Eviction Rates in Azure

This is quite an important metric to understand. VMs can be evicted based on capacity or the max price you set. To understand what you can expect in terms of how often your VMs will be evicted,

The portal shows historical pricing and eviction rates per size in a region. Select View pricing history and compare prices in different regions to see a table or graph of pricing for a specific size.

The lower the Eviction Rate percentage, the lower the chances of your VMs being evicted.

Spot Price in Azure

Pricing for Spot instances is variable, based on region and SKU. For more information, see pricing for Linux and Windows.

With variable pricing, you have the option to set a max price in US dollars (USD), using up to five decimal places. The price for the instance will be the current price for Spot or the price for a standard (On demand)instance, whichever is less, as long as there’s capacity and quota available. For example, the value of 0.98765 would be a max price of $0.98765 USD per hour. If you set the max price to -1, the instance won’t be evicted based on price.

AKS Eviction Policy

When we set up our AKS cluster and spot node groups, we have the option to set the eviction-policy parameter.

This Parameter can hold two values:

Delete
Deallocate

When you set the eviction policy to Delete, nodes in the underlying scale set of the node pool are deleted when they’re evicted.

When you set the eviction policy to Deallocate, nodes in the underlying scale set are set to the stopped-deallocated state upon eviction. Nodes in the stopped-deallocated state count against your compute quota and can cause cluster scaling or upgrading issues.

The priority and eviction-policy values can only be set during node pool creation. Those values can’t be updated later.

Taints, Toleration and affinity

The Kubernetes scheduler uses Taints and Tolerations to decide which workloads can run on which nodes.

In AKS, your Nodes that are part of the Spot node pool will have the kubernetes.azure.com/scalesetpriority=spot:NoSchedule taint applied to them.

To allow your pods to be scheduled on a spot node, you must apply toleration.

spec:
  containers:
  - name: spot-example
  tolerations:
  - key: "kubernetes.azure.com/scalesetpriority"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: "kubernetes.azure.com/scalesetpriority"
            operator: In
            values:
            - "spot"
   ...

In the example above, you can see the toleration applied; besides the toleration, you can see that also the node affinity property is set, which will prioritise these pods to be scheduled on the spot nodes.

Architectural considerations

Node selection

Selecting the ideal node to use boils down to your workload type. If you are running an application that needs to do heavy processing, you should choose a VM with an adequate CPU; on the other hand, if you are running a database having extra memory would be recommended.

Avoid placing your different workloads into the same type of VM as that would often mean having underutilised VMs or even suffering performance impact.

VM sizes — Azure Virtual Machines | Microsoft Docs

The above setup can be optimised as none of the VMs is fully utilised. By selecting the right VM size, we can even reduce the number of VMs and recognise significant savings.

Please check the Microsoft Azure VM selector, which is a great starting point for finding the right VM size.

Node pools design

In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped into node pools. These node pools contain the underlying VMs that run your applications. The initial number of nodes and their size (SKU) is defined when you create an AKS cluster, which makes a system node pool. You can create additional user node pools to support applications with different compute or storage demands.

When running ASK with Azure Spot Instances, besides creating node pools that consider the resource requirements of your workloads, it’s also recommended to consider what happens if certain types of VMs due to a specific event or high demand start getting evicted.

To avoid being in a situation, i.e., where all your CPU-optimized VMs part of the same node pool get evicted simultaneously, you should diversify your selection of VMs and reduce the risk.

In the diagram above, we can see that two node pools for CPU-optimized VMs are used, and the same logic is applied to the memory-optimized node pools.

To put things in perspective, you might configure your Compute intensive workloads to use a node pool F4s VMS and create a secondary node pool with F8s as an alternative node pool. To make sure you have the right balance and you should take advantage of the Autscaler and HPA explained below.

Using the AKS cluster autoscaler and the HorizontalPodAutoscaler (HPA) together

Assessing the performance and availability requirements in advance and deploying nodes accordingly manually is an option. Still, there are good tools that will remove a lot of the burden of anticipating and predicting all of the possible scenarios.

a) Horizontal Pod Autoscaling

The horizontal pod autoscaler uses the Metrics Server in a Kubernetes cluster to monitor the resource demand of pods. If an application needs more resources, the number of pods automatically increases to meet the demand.

More: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

b) Cluster autoscaler

The cluster autoscaler watches for pods that can’t be scheduled on nodes because of resource constraints. The cluster then automatically increases the number of nodes.

The Horizontal Pod Autoscaler, if properly configured, will maintain the optimal number of pods (scale out and scale in) based on your configuration.

If you require more flexibility and want to scale on different types of events unavailable in the HPA, please check the Kubernetes Event-driven Autoscaling (KEDA) project.

—

As we saw above, the HPA works on a pod level. To ensure that we have the available nodes where the pods will be scheduled, we should leverage the Cluster Autoscaler. If enabled, it will dynamically balance the number of nodes (VMs) based on parameters like time intervals between scale events and resource thresholds, etc... For more information on what parameters the cluster autoscaler uses, see Using the autoscaler profile.

When combined, the horizontal pod autoscaler is focused on running the number of pods required to meet application demand. The cluster autoscaler is focused on running the number of nodes necessary to support the scheduled pods.

Summa Summarum

With Azure Spot, you can save a lot on your Azure bill. It will undoubtedly be an overly exciting day when you see your AKS bill cut but at least half.

But what is even more important and exciting aspect is the process of preparing your workloads to run on Spot. The fact that you don’t get the standard SLA and you must be prepared and anticipate interruptions will drive you to apply good design practices and create resilient and scalable architecture.