Karpenter Magic: Deployments Splitting Spot & On-Demand

Published in

Deutsche Telekom Digital Labs

7 min readSep 10, 2024

In the world of cloud-native applications, managing costs without compromising on performance is crucial. Enter Karpenter, a Kubernetes cluster autoscaler built by AWS, designed to improve availability, reliability, and cost-efficiency of your Kubernetes clusters. One of Karpenter’s most powerful features is the ability to split your workloads between Spot and On-Demand instances, allowing you to optimize both cost and reliability.

In this blog, we’ll explore how to install and configure Karpenter to achieve a 50/50 split between Spot and On-Demand instances for your Kubernetes workloads. We’ll walk through a detailed installation script, discuss the nuances of configuring node pools, and show you how to use topology spreads to achieve different splits.

Why Karpenter?

Karpenter dynamically provisions the right compute resources in response to changing application load in your Kubernetes clusters. Unlike the Kubernetes Cluster Autoscaler, Karpenter supports fast scaling with the ability to mix and match instance types and purchase options (like Spot Instances) to optimize your workloads.

Step 1: Installing Karpenter

The first step in leveraging Karpenter’s powerful autoscaling capabilities is installing it in your EKS cluster. Below is a script to deploy Karpenter using Helm. This script handles everything from creating the necessary IAM roles and Fargate profiles to deploying Karpenter itself.

#!/bin/bash
### This script deploys Karpenter with an existing node role via Helm.
### Disclaimer
# This script is intended for quick Karpenter deployment. Changes should be thoroughly reviewed and tested.
# The script is tested with the use of an existing node role but has an option to create a new node role.
# If a new node role is needed, modifications are required in several places.
# Always test changes in a non-production environment before execution.
### Overview
# 1. Create Fargate Profile with IAM role for the pod.
# 2. Create Fargate Profile for kube-system core-dns and patch core-dns to run on Fargate.
# 3. Create Karpenter Controller IAM role (with additional KMS key permissions).
# 4. Optionally create Karpenter node role and update aws-auth config (disabled by default to use an existing node role).
# 5. Create SQS queue for handling interruptions.
# 6. Deploy Karpenter via Helm.
# 7. Add tags to subnets and security group.
# 8. Create NodePool and NodeClass.
# 9. Scale down cluster-autoscaler to 0.
# Configuration
export ENVIRONMENT_TAG="Environment Name"
export PROJECT_TAG="project name "
export REGION_SOT="Region-name"
export CLUSTER_NAME="${PROJECT_TAG}-${ENVIRONMENT_TAG}-${REGION_SOT}-eks"
export SQS_QUEUE_NAME="${PROJECT_TAG}-${ENVIRONMENT_TAG}-${REGION_SOT}-eks-karpenter-sqs"
export SQS_QUEUE_ACCESS_POLICY="${PROJECT_TAG}-${ENVIRONMENT_TAG}-${REGION_SOT}-eks-karpentercontroller-sqs-policy"
export FARGATE_ROLE_NAME="${PROJECT_TAG}-${ENVIRONMENT_TAG}-${REGION_SOT}-eks-fargate-execution-role"
export KARPENTER_CONTROLLER_POLICY="${PROJECT_TAG}-${ENVIRONMENT_TAG}-${REGION_SOT}-eks-karpentercontroller-policy"
export KARPENTER_CONTROLLER_ROLE="${PROJECT_TAG}-${ENVIRONMENT_TAG}-${REGION_SOT}-eks-karpentercontroller-role"
export KARPENTER_EXISTING_NODE_ROLE="Name of node role "
export KMS_KEY_ID="Key-ID-XXXX-XXXX"  # Replace with your actual KMS key ID
export EKS_NODE_NAME="Existing node group "
export EKS_NODE_SG1="sg-id"
export EKS_PRI_SUBNET1="subnet-id"
export EKS_PRI_SUBNET2="subnet-id"
export EKS_PRI_SUBNET3="subnet-id"
# Namespace and Versioning
export KARPENTER_NAMESPACE="karpenter"
export KARPENTER_VERSION="0.37.0"
export K8S_VERSION="1.28"
export AWS_DEFAULT_REGION="us-west-2"
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export TEMPOUT="$(mktemp)"
export AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
### Fargate Role Creation ###
cat > trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks-fargate-pods.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF
aws iam create-role \
  --role-name "${FARGATE_ROLE_NAME}" \
  --assume-role-policy-document file://trust-policy.json
aws iam attach-role-policy \
  --role-name "${FARGATE_ROLE_NAME}" \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy
### Fargate Profiles Creation ###
# Karpenter Fargate Profile
aws eks create-fargate-profile \
  --fargate-profile-name karpenter \
  --region "${AWS_DEFAULT_REGION}" \
  --cluster-name "${CLUSTER_NAME}" \
  --pod-execution-role-arn arn:aws:iam::${AWS_ACCOUNT_ID}:role/${FARGATE_ROLE_NAME} \
  --selectors '[{"namespace": "karpenter"}]' \
  --subnets "[\"${EKS_PRI_SUBNET1}\",\"${EKS_PRI_SUBNET2}\",\"${EKS_PRI_SUBNET3}\"]"
# Kube-System Fargate Profile
aws eks create-fargate-profile \
  --fargate-profile-name kube-system \
  --region "${AWS_DEFAULT_REGION}" \
  --cluster-name "${CLUSTER_NAME}" \
  --pod-execution-role-arn arn:aws:iam::${AWS_ACCOUNT_ID}:role/${FARGATE_ROLE_NAME} \
  --selectors '[{"namespace": "kube-system", "labels": {"k8s-app": "kube-dns"}}]' \
  --subnets "[\"${EKS_PRI_SUBNET1}\",\"${EKS_PRI_SUBNET2}\",\"${EKS_PRI_SUBNET3}\"]"
# Patch CoreDNS to run on Fargate
kubectl patch deployment coredns \
  -n kube-system \
  --type json \
  -p='[{"op": "replace", "path": "/spec/template/metadata/annotations/eks.amazonaws.com~1compute-type", "value": "fargate"}]'
### Karpenter Controller Role Creation ###
cat <<EOF > karpenter-controller-trust-policy.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/$(aws eks describe-cluster --name $CLUSTER_NAME --query 'cluster.identity.oidc.issuer' --output text | sed 's/https\:\/\///g')"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "$(aws eks describe-cluster --name $CLUSTER_NAME --query 'cluster.identity.oidc.issuer' --output text | sed 's/https\:\/\///g'):aud": "sts.amazonaws.com",
                    "$(aws eks describe-cluster --name $CLUSTER_NAME --query 'cluster.identity.oidc.issuer' --output text | sed 's/https\:\/\///g'):sub": "system:serviceaccount:${KARPENTER_NAMESPACE}:karpenter"
                }
            }
        }
    ]
}
EOF
aws iam create-role \
  --role-name "${KARPENTER_CONTROLLER_ROLE}" \
  --assume-role-policy-document file://"karpenter-controller-trust-policy.json"
cat <<EOF > karpenter-controller-policy.json
{
    "Statement": [
        {
            "Action": [
                "ssm:GetParameter",
                "iam:PassRole",
                "iam:GetInstanceProfile",
                "iam:CreateInstanceProfile",
                "iam:TagInstanceProfile",
                "iam:AddRoleToInstanceProfile",
                "ec2:DescribeImages",
                "ec2:RunInstances",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeLaunchTemplates",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeInstanceTypeOfferings",
                "ec2:DescribeAvailabilityZones",
                "ec2:DeleteLaunchTemplate",
                "ec2:CreateTags",
                "ec2:CreateLaunchTemplate",
                "ec2:CreateFleet",
                "ec2:DescribeSpotPriceHistory",
                "pricing:GetProducts",
                "eks:DescribeCluster"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "Karpenter"
        },
        {
            "Action": "ec2:TerminateInstances",
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "ConditionalEC2Termination"
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:CreateGrant",
            "kms:Decrypt",
            "kms:DescribeKey",
            "kms:Encrypt",
            "kms:GenerateDataKey",
            "kms:GenerateDataKeyPair",
            "kms:GenerateDataKeyPairWithoutPlaintext",
            "kms:GenerateDataKeyWithoutPlaintext",
            "kms:ReEncryptFrom",
            "kms:ReEncryptTo"
              ],
            "Resource": [
                "arn:aws:kms:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:key/${KMS_KEY_ID}"
            ]
        }
    ],
    "Version": "2012-10-17"
}
EOF
aws iam create-policy \
  --policy-name "${KARPENTER_CONTROLLER_POLICY}" \
  --policy-document file://"karpenter-controller-policy.json"
aws iam attach-role-policy \
  --policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/${KARPENTER_CONTROLLER_POLICY} \
  --role-name "${KARPENTER_CONTROLLER_ROLE}"
rm karpenter-controller-trust-policy.json karpenter-controller-policy.json
### Optionally Create Karpenter Node Role ###
# Uncomment if a new node role
 is needed.
# aws iam create-role \
#   --role-name ${KARPENTER_EXISTING_NODE_ROLE} \
#   --assume-role-policy-document file://trust-policy.json
# 
# aws iam attach-role-policy \
#   --role-name ${KARPENTER_EXISTING_NODE_ROLE} \
#   --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
# 
# aws iam attach-role-policy \
#   --role-name ${KARPENTER_EXISTING_NODE_ROLE} \
#   --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
### Create SQS Queue for Interruption Handling ###
aws sqs create-queue \
  --queue-name ${SQS_QUEUE_NAME}
aws sqs set-queue-attributes \
  --queue-url $(aws sqs get-queue-url --queue-name ${SQS_QUEUE_NAME} --output text) \
  --attributes '{"Policy": "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"AWS\":\"*\"},\"Action\":\"sqs:SendMessage\",\"Resource\":\"arn:aws:sqs:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:${SQS_QUEUE_NAME}\",\"Condition\":{\"ArnEquals\":{\"aws:SourceArn\":\"arn:aws:ec2:${AWS_DEFAULT_REGION}:${AWS_ACCOUNT_ID}:instance/*\"}}}]}"}'
### Deploy Karpenter with Helm ###
helm repo add karpenter https://charts.karpenter.sh
helm repo update
helm install karpenter karpenter/karpenter \
  --namespace ${KARPENTER_NAMESPACE} \
  --create-namespace \
  --version ${KARPENTER_VERSION} \
  --set clusterName=${CLUSTER_NAME} \
  --set clusterEndpoint=$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text) \
  --set defaultProvisioner.create=false \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::${AWS_ACCOUNT_ID}:role/${KARPENTER_CONTROLLER_ROLE} \
  --set aws.defaultInstanceProfile=${KARPENTER_EXISTING_NODE_ROLE} \
  --set aws.interruptionQueueName=${SQS_QUEUE_NAME}
### Tagging Subnets and Security Groups for Karpenter ###
aws ec2 create-tags \
  --resources ${EKS_PRI_SUBNET1} ${EKS_PRI_SUBNET2} ${EKS_PRI_SUBNET3} \
  --tags Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}
aws ec2 create-tags \
  --resources ${EKS_NODE_SG1} \
  --tags Key=karpenter.sh/discovery,Value=${CLUSTER_NAME 

#Create NodeClass
cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2 # Amazon Linux 2
role: "${KARPENTER_EXISTING_NODE_ROLE}"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "enabled" # replace with your cluster name
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "enabled" # replace with your cluster name
amiSelectorTerms:
- id: "${AMD_AMI_ID}"
tags:
project: "${PROJECT_TAG}"
environment: "${ENVIRONMENT_TAG}"
managed: karpenter
resource: eks
resourcetype: private
backup: "false"
osversion: al2
dataclassification: internal
component: eks
vertical: application
natco: all
Name: "${EKS_NODE_NAME}"}"
EOF

echo "Karpenter installation completed."

Step 2: Configuring Node Pools

Once Karpenter is installed, the next step is to configure node pools that define the type of instances to be used. We’ll create two node pools, one for Spot Instances and one for On-Demand Instances.

NodePool for On-demand :

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: nodepool-on-demand
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["m5.large", "m5.xlarge", "m5.2xlarge", "m5.4xlarge"]
        - key: topology.kubernetes.io/zone
          operator: In
          values:
          - us-west-2a
          - us-west-2b
          - us-west-2c
        - key: capacity-spread
          operator: In
          values:
          - "1-od"
          - "2-od"
          - "3-od"
          - "4-od"
          - "5-od"
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 500
  disruption:
    budgets:
      - nodes: 10%
    consolidationPolicy: WhenUnderutilized
    expireAfter: 72h

NodePool for Spot :

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: nodepool-spot
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m","c"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["m5.large","m5.xlarge","m5.2xlarge","m5.4xlarge","m5a.large","m5a.xlarge","m5a.2xlarge","m5a.4xlarge","c5.large","c5.xlarge","c5.2xlarge","c5.4xlarge","c5a.large","c5a.xlarge","c5a.2xlarge","c5a.4xlarge"]
        - key: topology.kubernetes.io/zone
          operator: In
          values:
          - us-west-2a
          - us-west-2b
          - us-west-2c
        - key: capacity-spread
          operator: In
          values:
          - "1-s"
          - "2-s"
          - "3-s"
          - "4-s"
          - "5-s"
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 500
  disruption:
    budgets:
      - nodes: 10%
    consolidationPolicy: WhenUnderutilized
    expireAfter: 72h

Step 3: Adjusting Topology Spreads

To ensure your deployment’s pods are evenly distributed across Spot and On-Demand instances, you’ll use Kubernetes’ topologySpreadConstraints. Here’s how you can configure different splits:

50/50 OD to Spot split:

topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: capacity-spread
        whenUnsatisfiable: DoNotSchedule

40/60 OD to Spot split:

topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: capacity-spread
        whenUnsatisfiable: DoNotSchedule
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: capacity-spread
            operator: In
            values:
            - 1-s
            - 2-s
            - 1-od

80/20 OD to Spot split:

topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: capacity-spread
        whenUnsatisfiable: DoNotSchedule
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: capacity-spread
            operator: In
            values:
            - 1-od
            - 2-od
            - 3-od
            - 4-od
            - 1-s

By replacing the traditional autoscaler with Karpenter, we have dramatically improved our Kubernetes scaling performance. Previously, adding a new node would take around 5 minutes, followed by an additional minute to schedule the pod on the node, leading to a total pod scheduling time of 6 minutes if no nodes were available. Now, with Karpenter, we have reduced this entire process to just 2 minutes — 1 minute to add a new node and 1 minute for pod scheduling. This has enabled us to scale more aggressively while maintaining a cost-efficient solution. By utilizing spot instances alongside on-demand instances, we’ve achieved a 60% cost reduction. Our deployment strategy, which schedules 50% of pods on on-demand instances and 50% on spot instances, has also made our system more resilient to disruptions.

Conclusion

Karpenter offers a flexible and powerful way to manage your Kubernetes workloads, enabling you to balance cost and performance by splitting your deployments between Spot and On-Demand instances. By following the steps outlined in this blog, you can efficiently configure Karpenter to meet your workload requirements while optimizing your cloud costs.

Acknowledgments

A sincere thank you to Abhishek Srivastava, Akshay Sharma, Randhir Thakur, Tarun Kumawat for playing pivotal roles in guiding & Presenting problem statements. Abhishek’s leadership, combined with Akshay,Randhir,Tarun invaluable support, enriched the content with their expertise.