Optimizing AWS Costs: EKS Dynamic Scaling using EventBridge and Lambda

Damindu Bandara
ADL Blog
Published in
5 min readMay 13, 2024

Introduction

Amazon EKS offers flexibility in deploying Kubernetes applications, with EC2 instances being a popular choice for node hosting. This guide focuses on optimizing costs by automating the scaling down of EKS clusters during off-peak hours. By leveraging AWS Lambda Functions and AWS EventBridge, you can efficiently reduce EC2 computing expenses without sacrificing operational efficiency.

Understanding the Components

AWS EventBridge:

  • Central hub for event-driven architecture.
  • Integrates various AWS services and custom applications.
  • Captures events from AWS services, SaaS apps, and custom sources.

AWS Lambda:

  • Serverless computing service.
  • Executes code in response to triggers.
  • Supports multiple programming languages.
  • Executes custom logic in response to events received from EventBridge.

Amazon Elastic Kubernetes Service (EKS):

  • Managed Kubernetes service on AWS.
  • Simplifies deployment, management, and scaling of containerized applications.
  • Eliminates the need to manage underlying infrastructure.
  • Enables efficient resource utilization and scaling based on demand.

Architecture Diagram

Steps Need to follow :

Step 1 : Configure an AWS IAM policy for EKS Cluster
Step 2 : Create an IAM role for the new policy for Lambda Function
Step 3 : Create Lambda Functions for ScaleUp and ScaleDown
Step 4 : Create EventBridge scheduler for Scale Down
Step 5 : Create EventBridge scheduler for Scale Up

Step 1 : Configure an AWS IAM policy for EKS Cluster

First we need to create an IAM policy for EKS Cluster. It allows ListNodegroups, UpdateNodegroupConfig, and DescribeNodegroup functions.

In IAM Console -> Access Management -> Policies -> Create Policy

On the JSON tab need to enter following policy code :

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"eks:ListNodegroups",
"eks:UpdateNodegroupConfig",
"eks:DescribeNodegroup"
],
"Resource": [
"arn:aws:eks:CLUSTER_REGION:ACCOUNT_ID:cluster/CLUSTER_NAME",
"arn:aws:eks:CLUSTER_REGION:ACCOUNT_ID:nodegroup/CLUSTER_NAME/*/*"
]
}
]
}

(You need to update cluster details in above code)

Step 2 : Create an IAM role for the new policy for Lambda Function

In IAM Console -> Access Management -> Roles-> Create Role

Select Trusted entity type as AWS Service and Use case as Lambda.

Next to the permissions and select the policy that you created in previous step.

Next to the review and enter role name as you preferred.

Step 3 : Create Lambda Functions for Scale Up and Scale Down

Step 3.1 : Create Lambda Functions for Scale Up

In Lambda -> Create Function

In Permission, Select Use an existing role and select role that you created in previous step.

Next add below python code in the code section and Deploy it.

import boto3

def describe_nodegroup(client, cluster_name, nodegroup_name):
return client.describe_nodegroup(
clusterName=cluster_name,
nodegroupName=nodegroup_name
)['nodegroup']['scalingConfig']

def update_nodegroup_config(client, cluster_name, nodegroup_name, scaling_config):
client.update_nodegroup_config(
clusterName=cluster_name,
nodegroupName=nodegroup_name,
scalingConfig=scaling_config
)

def update_nodegroup_sizes(eks, cluster_name, nodegroup_sizes):
for nodegroup_name, size in nodegroup_sizes.items():
current_size = describe_nodegroup(eks, cluster_name, nodegroup_name)
if size != current_size['desiredSize']:
update_nodegroup_config(eks, cluster_name, nodegroup_name, {'desiredSize': size})
print(f"Updated desired size for node group {nodegroup_name} to {size}")
else:
print(f"Desired size is already {size} for node group {nodegroup_name}")

def update_nodegroup_limits(eks, cluster_name, nodegroup_limits, key):
for nodegroup_name, limit in nodegroup_limits.items():
current_limit = describe_nodegroup(eks, cluster_name, nodegroup_name)
if limit != current_limit[key]:
update_nodegroup_config(eks, cluster_name, nodegroup_name, {key: limit})
print(f"Updated {key} for node group {nodegroup_name} to {limit}")
else:
print(f"{key.capitalize()} is already {limit} for node group {nodegroup_name}")

def lambda_handler(event, context):
eks = boto3.client('eks')
cluster_name = "CLUSTER_NAME"

nodegroup_sizes = {
'NODE_GROUP_1': ACTUAL_SIZE,
'NODE_GROUP_2': ACTUAL_SIZE,
'NODE_GROUP_3': ACTUAL_SIZE
}

nodegroup_minsizes = {
'NODE_GROUP_1': MIN_SIZE,
'NODE_GROUP_2': MIN_SIZE,
'NODE_GROUP_3': MIN_SIZE
}

nodegroup_maxsizes = {
'NODE_GROUP_1': MAX_SIZE,
'NODE_GROUP_2': MAX_SIZE,
'NODE_GROUP_3': MAX_SIZE
}

update_nodegroup_limits(eks, cluster_name, nodegroup_maxsizes, 'maxSize')
update_nodegroup_sizes(eks, cluster_name, nodegroup_sizes)
update_nodegroup_limits(eks, cluster_name, nodegroup_minsizes, 'minSize')

(You need to update cluster details in above code)

Step 3.2 : Create Lambda Functions for Scale Down

Same as scale up function create the scale down function using below python code

import boto3

def lambda_handler(event, context):
eks = boto3.client("eks")
cluster_name = "CLUSTER_NAME"
nodegroup_names = ["NODE_GROUP_1", "NODE_GROUP_2", "NODE_GROUP_3"]
new_desiredSize = 0
new_minSize = 0
new_maxSize = 1

# Update scaling configuration for all node groups
for nodegroup_name in nodegroup_names:
response = eks.update_nodegroup_config(
clusterName=cluster_name,
nodegroupName=nodegroup_name,
scalingConfig={
"desiredSize": new_desiredSize,
"minSize": new_minSize,
"maxSize": new_maxSize
}
)
# Print response if needed for debugging
# print(response)

(You need to update cluster details in above code)

Step 4 : Create EventBridge scheduler for Scale Down

In EventBridge -> Scheduler -> Create Schedule

Next you need to give schedule name and pattern. You can crate cron-based schedule for scale down the eks node groups.

Next select the target as Lambda.

In the Invoke section select the scale down function that you want to trigger.

Next Review and create a schedule.

Step 5 : Create EventBridge scheduler for Scale Up

Same as scale down scheduler, create a new scheduler for scale up function.

Conclusion

The article discusses about the scaling processes for Amazon Elastic Kubernetes Service (EKS) cluster, focusing specifically on those running on EC2 instances. It outlines the cost structure associated with running EKS clusters and suggests a strategy to reduce costs by scaling down non-production clusters during off-peak hours, such as at night. The proposed solution involves utilizing AWS Lambda Functions with Python scripts and AWS EventBridge to automate the scaling down process.

--

--