Cost-Efficient Kubernetes Setup in AWS using EKS with Karpenter and Fargate

Tom Jose
kotaicode
Published in
8 min readMay 27, 2024

Introduction

Karpenter is an open-source Kubernetes cluster autoscaler designed to optimize the provisioning and scaling of compute resources. It dynamically adjusts the number of nodes in a Kubernetes cluster based on the workloads and resource demands, aiming to ensure efficient utilization and cost-effectiveness. Here are some key features and functionalities of Karpenter:

  1. Dynamic Scaling: Karpenter can rapidly scale the number of nodes up or down in response to changing workload demands, helping maintain application performance and availability.
  2. Resource Optimization: It intelligently selects the appropriate instance types and sizes to match the specific requirements of the workloads, optimizing both performance and cost.
  3. Workload-Aware Scheduling: It considers the specific needs and characteristics of the workloads when making scaling decisions, ensuring that the resources are allocated efficiently.
  4. Cost Efficiency: By dynamically adjusting the cluster size and selecting the most cost-effective resources, Karpenter helps reduce the overall operational costs associated with running Kubernetes clusters.

Why you should use fargate profile for running karpenter pods?

Using Fargate profiles for running Karpenter pods can offer several advantages

  • Simplified infrastructure management: With Fargate, there’s no need to create and manage EC2 instances or node groups to run your Kubernetes pods. This reduces the complexity associated with managing the underlying infrastructure.
  • Resource Optimization: Fargate automatically provisions the required compute resources for your pods, ensuring optimal use of resources without the need for pre-configured node groups.
  • Quick Configuration: Setting up Fargate profiles for your Kubernetes cluster is straightforward and can be done through the AWS Management Console, AWS CLI, or AWS SDKs.
  • Automated Infrastructure Updates: Fargate handles all updates to the underlying infrastructure automatically. This includes security patches, OS updates, and other maintenance tasks, ensuring your environment is always up-to-date.
  • Reduced Downtime: Because Fargate manages updates without requiring manual intervention, there is less risk of downtime due to improperly applied updates or missed maintenance windows.

Architecture

Karpenter pods are running on a Fargate profile, which allows them to dynamically manage the underlying nodes according to the load and specific requirements of your applications. This setup ensures that the infrastructure scales efficiently, automatically provisioning and deprovisioning resources as needed to handle varying workloads. By leveraging Fargate, Karpenter can focus on optimizing resource utilization and maintaining performance without the need for manual intervention in node management.

Prerequisites:

  1. AWS account with appropriate permissions.
  2. AWS EKS cluster
  3. AWS CLI
  4. Private subnets, without private subnet you can’t create fargate profile.
  5. Helm

Create fargate profile in the eks cluster
Navigate to EKS cluster and create a fargate profile.

create fargate profile

Give a name to your fargate profile. If you do not see a pod execution role listed, click on “Create a role in IAM console” and follow the prompts to create a new role specifically for pod execution.

create IAM role for pod execution

use highlighted use case and create pod execution role.

By adding a namespace, any pods deployed within it will automatically utilize our Fargate profile for execution.

then create fargate profile.

Configure AWS policies and Roles.

Set following variables.

KARPENTER_NAMESPACE=karpenter
CLUSTER_NAME=<your cluster name>

Set other variables from your cluster configuration.

AWS_PARTITION="aws" # if you are not using standard partitions, you may need to configure to aws-cn / aws-us-gov
AWS_REGION="$(aws configure list | grep region | tr -s " " | cut -d" " -f3)"
OIDC_ENDPOINT="$(aws eks describe-cluster --name "${CLUSTER_NAME}" \
--query "cluster.identity.oidc.issuer" --output text)"
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' \
--output text)
K8S_VERSION=1.28 # enter your kubernetes version
ARM_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-arm64/recommended/image_id --query Parameter.Value --output text)"
AMD_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2/recommended/image_id --query Parameter.Value --output text)"
GPU_AMI_ID="$(aws ssm get-parameter --name /aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2-gpu/recommended/image_id --query Parameter.Value --output text)"

Create karpenter node role

echo '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}' > node-trust-policy.json

aws iam create-role --role-name "KarpenterNodeRole-${CLUSTER_NAME}" \
--assume-role-policy-document file://node-trust-policy.json

Attach required policies to the role

aws iam attach-role-policy --role-name "KarpenterNodeRole-${CLUSTER_NAME}" \
--policy-arn "arn:${AWS_PARTITION}:iam::aws:policy/AmazonEKSWorkerNodePolicy"

aws iam attach-role-policy --role-name "KarpenterNodeRole-${CLUSTER_NAME}" \
--policy-arn "arn:${AWS_PARTITION}:iam::aws:policy/AmazonEKS_CNI_Policy"

aws iam attach-role-policy --role-name "KarpenterNodeRole-${CLUSTER_NAME}" \
--policy-arn "arn:${AWS_PARTITION}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"

aws iam attach-role-policy --role-name "KarpenterNodeRole-${CLUSTER_NAME}" \
--policy-arn "arn:${AWS_PARTITION}:iam::aws:policy/AmazonSSMManagedInstanceCore"
karpenter node role

Now we need to create an IAM role that the Karpenter controller will use to provision new instances. The controller will be using IAM Roles for Service Accounts (IRSA) which requires an OIDC endpoint.

cat << EOF > controller-trust-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_ENDPOINT#*//}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_ENDPOINT#*//}:aud": "sts.amazonaws.com",
"${OIDC_ENDPOINT#*//}:sub": "system:serviceaccount:${KARPENTER_NAMESPACE}:karpenter"
}
}
}
]
}
EOF

aws iam create-role --role-name "KarpenterControllerRole-${CLUSTER_NAME}" \
--assume-role-policy-document file://controller-trust-policy.json

cat << EOF > controller-policy.json
{
"Statement": [
{
"Action": [
"ssm:GetParameter",
"ec2:DescribeImages",
"ec2:RunInstances",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeAvailabilityZones",
"ec2:DeleteLaunchTemplate",
"ec2:CreateTags",
"ec2:CreateLaunchTemplate",
"ec2:CreateFleet",
"ec2:DescribeSpotPriceHistory",
"pricing:GetProducts"
],
"Effect": "Allow",
"Resource": "*",
"Sid": "Karpenter"
},
{
"Action": "ec2:TerminateInstances",
"Condition": {
"StringLike": {
"ec2:ResourceTag/karpenter.sh/nodepool": "*"
}
},
"Effect": "Allow",
"Resource": "*",
"Sid": "ConditionalEC2Termination"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}",
"Sid": "PassNodeIAMRole"
},
{
"Effect": "Allow",
"Action": "eks:DescribeCluster",
"Resource": "arn:${AWS_PARTITION}:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/${CLUSTER_NAME}",
"Sid": "EKSClusterEndpointLookup"
},
{
"Sid": "AllowScopedInstanceProfileCreationActions",
"Effect": "Allow",
"Resource": "*",
"Action": [
"iam:CreateInstanceProfile"
],
"Condition": {
"StringEquals": {
"aws:RequestTag/kubernetes.io/cluster/${CLUSTER_NAME}": "owned",
"aws:RequestTag/topology.kubernetes.io/region": "${AWS_REGION}"
},
"StringLike": {
"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "AllowScopedInstanceProfileTagActions",
"Effect": "Allow",
"Resource": "*",
"Action": [
"iam:TagInstanceProfile"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/${CLUSTER_NAME}": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "${AWS_REGION}",
"aws:RequestTag/kubernetes.io/cluster/${CLUSTER_NAME}": "owned",
"aws:RequestTag/topology.kubernetes.io/region": "${AWS_REGION}"
},
"StringLike": {
"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*",
"aws:RequestTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "AllowScopedInstanceProfileActions",
"Effect": "Allow",
"Resource": "*",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:DeleteInstanceProfile"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/kubernetes.io/cluster/${CLUSTER_NAME}": "owned",
"aws:ResourceTag/topology.kubernetes.io/region": "${AWS_REGION}"
},
"StringLike": {
"aws:ResourceTag/karpenter.k8s.aws/ec2nodeclass": "*"
}
}
},
{
"Sid": "AllowInstanceProfileReadActions",
"Effect": "Allow",
"Resource": "*",
"Action": "iam:GetInstanceProfile"
}
],
"Version": "2012-10-17"
}
EOF

aws iam put-role-policy --role-name "KarpenterControllerRole-${CLUSTER_NAME}" \
--policy-name "KarpenterControllerPolicy-${CLUSTER_NAME}" \
--policy-document file://controller-policy.json
karpenter controller role

Update aws-auth ConfigMap

We need to allow nodes that are using the node IAM role we just created to join the cluster. To do that we have to modify the aws-auth ConfigMap in the cluster.

kubectl edit configmap aws-auth -n kube-system

You will need to add following section to the configmap (do not change any existing configuration, just add this to the groups section). Replace the ${AWS_PARTITION} variable with the account partition, ${AWS_ACCOUNT_ID} variable with your account ID, and ${CLUSTER_NAME}.
Note: Do not replace the {{EC2PrivateDNSName}}

- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterNodeRole-${CLUSTER_NAME}
username: system:node:{{EC2PrivateDNSName}}

Deploy Karpenter

First set the Karpenter release you want to deploy.
export KARPENTER_VERSION=”0.36.2"

We can now generate a full Karpenter deployment yaml from the Helm chart.

helm template karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi > karpenter.yaml

Remove node affinity from the Deployment, we are running karpenter pods on fargate so no need to set node affinity.

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: karpenter.sh/nodepool
operator: DoesNotExist
- key: eks.amazonaws.com/nodegroup
operator: In
values:
- ${NODEGROUP}
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: "kubernetes.io/hostname"

# Remove this part from deployment

Create karpenter namespace in k8s cluster
kubectl create namespace karpenter

we are creating CRDs and deployments in the karpenter namespace. Please update all references to use the karpenter namespace.

namespace: karpenter 

Now create karpenter deployments and CRDs

kubectl create -f \
"https://raw.githubusercontent.com/aws/karpenter-provider-aws/v${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_nodepools.yaml"
kubectl create -f \
"https://raw.githubusercontent.com/aws/karpenter-provider-aws/v${KARPENTER_VERSION}/pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml"
kubectl create -f \
"https://raw.githubusercontent.com/aws/karpenter-provider-aws/v${KARPENTER_VERSION}/pkg/apis/crds/karpenter.sh_nodeclaims.yaml"
kubectl apply -f karpenter.yaml

Now, you’ll notice Karpenter pods running.

karpenter running

If anything goes wrong, just check the logs for errors.
kubectl logs <name of the pod>

Create default NodePool
We need to create a default NodePool so Karpenter knows what types of nodes we want for unscheduled workloads

cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 720h # 30 * 24h = 720h
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2 # Amazon Linux 2
role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
amiSelectorTerms:
- id: "${ARM_AMI_ID}"
- id: "${AMD_AMI_ID}"
# - id: "${GPU_AMI_ID}" # <- GPU Optimized AMD AMI
# - name: "amazon-eks-node-${K8S_VERSION}-*" # <- automatically upgrade when a new AL2 EKS Optimized AMI is released. This is unsafe for production workloads. Validate AMIs in lower environments before deploying them to production.
EOF

Remove Cluster Auto Scaler
Now that karpenter is running we can disable the cluster autoscaler if you have any. To do that we will scale the number of replicas to zero.

kubectl scale deploy/cluster-autoscaler -n kube-system --replicas=0

To get rid of the instances that were added from the node group we can scale our nodegroup down to a minimum size to support Karpenter and other critical services.

Note: If your workloads do not have pod disruption budgets set, the following command will cause workloads to be unavailable.

If you have a single multi-AZ node group, we suggest a minimum of 2 instances.

aws eks update-nodegroup-config --cluster-name "${CLUSTER_NAME}" \
--nodegroup-name "${NODEGROUP}" \
--scaling-config "minSize=2,maxSize=2,desiredSize=2"

Or, if you have multiple single-AZ node groups, we suggest a minimum of 1 instance each.

for NODEGROUP in $(aws eks list-nodegroups --cluster-name "${CLUSTER_NAME}" \
--query 'nodegroups' --output text); do aws eks update-nodegroup-config --cluster-name "${CLUSTER_NAME}" \
--nodegroup-name "${NODEGROUP}" \
--scaling-config "minSize=1,maxSize=1,desiredSize=1"
done

Now remove your managed nodegroups if you have any, karpenter will automatically create new nodes instead of that.

You can use eks-node-viewer tool for visualizing dynamic node usage within a cluster.

Reference

--

--