Cost Optimization — Azure Kubernetes Service : PART I

Chaskarshailesh
Javarevisited
Published in
5 min readOct 28, 2023

Cost optimization is about understanding your different configuration options and recommended best practices to reduce unnecessary expenses and improve operational efficiencies.

Azure Kubernetes Service Cost optimization — lets differentiate between cost of cluster resources and cost of workload resources.

  1. Cluster resources are a shared responsibility between the cluster admin and their resource provider.
  2. Workload resources are the domain of a developer. Azure Kubernetes Service has considerations and recommendations for both of these roles.

Recommendations :-

  1. Utilize AKS cluster pre-set configurations — especially for DEV/TEST or PRODUCTION Environment.

2. Consider using Azure Spot VMs for workloads that can handle interruptions, early terminations, and evictions. Workloads such as batch processing jobs, development and testing environments, and large compute workloads may be good candidates for you to schedule on a spot node pool. Using spot VMs for nodes with your AKS cluster allows you to take advantage of unused capacity in Azure at a significant cost savings.

3. Enforce resource quotas at the namespace level. These quotas are defined on a namespace and can be used to set quotas on compute resources, storage resources, and object counts. When you define resource quotas, all pods created in the namespace must provide limits or requests in their pod specifications.

4. Sign up for Azure Reservations — If you properly planned for capacity, your workload is predictable and exists for an extended period of time, sign up for Azure Reserved Instances to further reduce your resource costs.

Many more recommendations out of which I have picked up top 4.

Lets Understand about Nodes and Node pool in Azure Kubernetes Cluster-

  1. First lets understand a Node uses CPU and Memory for processes such as Kubelet, daemons such as kube-proxy and the Operating System. Kubernetes also reserved memory for the eviction threshold to evict workloads when there isn’t enough space left on the node — Lets call this as reserved resources.
  2. Now lets calculate how many Pods can be provisioned in a AKS node of 2 CORE and 8 GIB Memory — DS2 V2

Refer — https://learnk8s.io/kubernetes-instance-calculator#what-s-the-maximum-number-of-pods-in-aks-

With Memory Efficiency of 73% and CPU efficiency of 100% we can scheduled 17 Pods.

Formula for —

EFFICIENCY_CPU = TOTAL_CPU_PODs / (TOTAL_CPU — TOTAL_RESERVED_CPU)

EFFICIENCY_MEMORY = TOTAL_MEMORY_PODs / (TOTAL_MEMORY — TOTAL_MEMORY_RESERVED)

Ideally as per AKS maximum 250 Pods can be scheduled on a Node.

3. Next lets discuss about Node Pools — Azure Kubernetes Service allows you to create different node pools to match specific workloads to the nodes running in each node pool. The process of matching workloads to nodes enables you to plan compute consumption and optimize cost.

A node pool describes a group of nodes with the same configuration in an AKS cluster. These nodes contain the underlying VMs that run your applications. You can create two types of node pools on an AKS-managed Kubernetes cluster:

  • System node pools — critical system pods that make up the control plane of your cluster. Allows the use of Linux only as the node OS and runs only Linux-based workloads. For production environments, the recommended node count for a system node pool is a minimum of three nodes.
  • User node pools — support your workloads, and you can specify Windows or Linux as the node operating system.
  • Spot node pool — is a user node pool that uses a spot virtual machine scale set. AKS supports spot VMs when you:
    a. Need to create user node pools.
    b. Want the cost benefits offered by virtual machine scale set support for Azure spot VMs.

Lets explore cost optimization by provisioning an Azure Kubernetes Cluster -

Step 1 — Lets create resource group

REGION_NAME=centralindia
RESOURCE_GROUP=rg-letsailtogether
AKS_CLUSTER_NAME=letsailtogether-aks

VERSION=$(az aks get-versions --location $REGION_NAME --query "values[?isPreview==null].version | [-1]" --output tsv)
az group create --name $RESOURCE_GROUP --location $REGION_NAME

Step 2 — Lets create AKS Cluster

az aks create --resource-group $RESOURCE_GROUP --name $AKS_CLUSTER_NAME --location $REGION_NAME --kubernetes-version $VERSION --node-count 2 --load-balancer-sku standard --vm-set-type VirtualMachineScaleSets --generate-ssh-keys

az aks nodepool list --resource-group $RESOURCE_GROUP --cluster-name $AKS_CLUSTER_NAME -o table

2 System Nodes with System pods

Workload on System Nodes

Step 3 — Lets create User Node Pool

az aks nodepool add --resource-group $RESOURCE_GROUP --cluster-name $AKS_CLUSTER_NAME --name usernodepool001 --node-count 2

Step 4 — Lets create Spot Node Pool

az aks nodepool add --resource-group $RESOURCE_GROUP --cluster-name $AKS_CLUSTER_NAME --name spotnodepool --enable-cluster-autoscaler --max-count 3 --min-count 1 --priority Spot --eviction-policy Delete --spot-max-price -1 --node-vm-size Standard_DS2_v2 --no-wait

Had to request for add on Quota to add more nodes to the spot node pool.

Support request was Instantly approved

6 Node were provisioned post quota request approval

That’s it in this post covered some info around Cost Optimization on AKS and how to provision 3 diff types of node pools in AKS.

In Next Post will try to cover how we can Optimize cost further by -

a. Cluster Autoscaler to scale up and down as per demand

b. Scheduling the pods on spot nodes.

c. Apply policy to restrict resource CPU and Memory utilization.

lets be connected and lets sail together…..!!

--

--

Chaskarshailesh
Javarevisited

I am a Site Reliability Engineer aspirant Cloud Solutions Architect. Further exploring the horizon into MLOps