AWS EKS: Automatically deploy and manage 100 clusters

Aniket Kharpatil
5 min readJul 24, 2023

Deploying multiple Amazon EKS clusters can be a tedious and error-prone process if done manually. In this blog post, we will see how to automate the deployment of 100 EKS clusters at once using Infrastructure as Code tools like AWS CloudFormation and Terraform.

🔹Creating IAM roles

We will first need to create IAM roles for the EKS cluster and nodes. These roles will allow the cluster and nodes to make AWS API calls.

🔸For the EKS cluster, create an IAM role with the following permissions:

  • AmazonEKSClusterPolicy
  • AmazonEKSServicePolicy

🔸For the node instances, create another IAM role with the following permissions:

  • AmazonEC2ContainerRegistryReadOnly
  • AmazonEKSWorkerNodePolicy
  • AmazonSSMManagedInstanceCore

You can create these roles using the AWS CLI or AWS Console. Make sure to note down the ARNs of these roles, we will need them later.

🔹Creating a VPC template

Each EKS cluster needs its own VPC and subnets. We will create a terraform template to provision a VPC for each cluster. The template will accept parameters like:

  • ClusterName
  • VpcCidr
  • PublicSubnetCidr
  • PrivateSubnetCidr

The template will create:

  • A VPC
  • Two public subnets
  • Two private subnets

You can create a single CloudFormation template or terraform “.tf” file with these resources and reuse it for all 100 clusters by passing different parameter values for each cluster.

  # Create VPC
resource "aws_vpc" "main-vpc" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "main-vpc"
}
}

# Create public and private subnets

# Create public subnet in AZ1
resource "aws_subnet" "public-subnet-az1" {
vpc_id = aws_vpc.main-vpc.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "public-subnet-az1"
}
}

# Create public subnet in AZ2
resource "aws_subnet" "public-subnet-az2" {
vpc_id = aws_vpc.main-vpc.id
cidr_block = "10.0.2.0/24"
availability_zone = "us-east-1b"
tags = {
Name = "public-subnet-az2"
}
}

# Create private subnet in AZ1
resource "aws_subnet" "private-subnet-az1" {
vpc_id = aws_vpc.main-vpc.id
cidr_block = "10.0.3.0/24"
availability_zone = "us-east-1a"
tags = {
Name = "private-subnet-az1"
}
}

# Create private subnet in AZ2
resource "aws_subnet" "private-subnet-az2" {
vpc_id = aws_vpc.main-vpc.id
cidr_block = "10.0.4.0/24"
availability_zone = "us-east-1b"
tags = {
Name = "private-subnet-az2"
}
}

# Create Internet Gateway
resource "aws_internet_gateway" "main-igw" {
vpc_id = aws_vpc.main-vpc.id
tags = {
Name = "main-igw"
}
}

# Create public route table
resource "aws_route_table" "public-rtb" {
vpc_id = aws_vpc.main-vpc.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main-igw.id
}
tags = {
Name = "public-rtb"
}
}

# Association of public route table

# Associate public route table with public subnets
resource "aws_route_table_association" "public-rtba-az1" {
subnet_id = aws_subnet.public-subnet-az1.id
route_table_id = aws_route_table.public-rtb.id
}

resource "aws_route_table_association" "public_rtba-az2" {
subnet_id = aws_subnet.public-subnet-az2.id
route_table_id = aws_route_table.public-rtb.id
}

# Create and Associate private route tables

# Create private route table for AZ 1
resource "aws_route_table" "private-rtb-az1" {
vpc_id = aws_vpc.main-vpc.id
tags = {
Name = "private-rtb-az1"
}
}

# Create private route table for AZ 2
resource "aws_route_table" "private-rtb-az2" {
vpc_id = aws_vpc.main-vpc.id
tags = {
Name = "private-rtb-az2"
}
}

# Associate private route table 1 with AZ1 private subnet
resource "aws_route_table_association" "private-rtba-az1" {
subnet_id = aws_subnet.private-subnet-az1.id
route_table_id = aws_route_table.private-rtb-az1.id
}

# Associate private route table 2 with AZ2 private subnet
resource "aws_route_table_association" "private-rtba-az2" {
subnet_id = aws_subnet.private-subnet-az2.id
route_table_id = aws_route_table.private-rtb-az2.id
}

🔹Deploying the clusters

Now that we have the IAM roles and VPC template ready, we can deploy the actual EKS clusters. We have two options here:

Option 1: Using CloudFormation

We can create a CloudFormation stack for each cluster that does the following:

  • Creates the VPC resources using our template
  • Creates the EKS cluster passing in the cluster IAM role ARN
  • Creates a node group passing in the node IAM role ARN

Option 2 : Using Terraform

We define two variables to accept the IAM role ARNs:

  • cluster_iam_role_arn
  • node_iam_role_arn

We then create the EKS cluster resource, passing the cluster_iam_role_arn variable to the role_arn property.

Next, we create the node group resource and pass the node_iam_role_arn variable to the node_role_arn property.

Here is a more complete Terraform template to create an EKS cluster and node group:

  variable "cluster_name" {
type = string
default = "my-cluster"
}

variable "cluster_version" {
type = string
default = "1.19"
}

variable "cluster_iam_role_arn" {
type = string
}

variable "node_iam_role_arn" {
type = string
}

# EKS Cluster
resource "aws_eks_cluster" "my_cluster" {
name = var.cluster_name
version = var.cluster_version
role_arn = var.cluster_iam_role_arn
subnet_ids = [ aws_subnet.public-subnet-az1.id,aws_subnet.public-subnet-az2.id ]
}

# Node Group
resource "aws_eks_node_group" "node" {
cluster_name = aws_eks_cluster.my_cluster.name
node_role_arn = var.node_iam_role_arn
node_group_name = "node-group"
node_role_name = "eks_node"
subnet_ids = [aws_subnet.private-subnet-az1.id,aws_subnet.private-subnet-az2.id]

instance_types = ["t3.medium"]
scaling_config {
desired_size = 2
max_size = 4
min_size = 1
}
}

This template does the following:

  • Accepts variables for cluster name, version, and IAM role ARNs
  • Creates an EKS cluster resource, passing the cluster IAM role
  • Creates a node group resource, passing:
  • Node IAM role ARN
  • Subnet IDs
  • Instance types
  • Scaling configuration with min/max nodes

You can deploy this template by running:

  terraform init
terraform plan \
-var cluster_iam_role_arn=arn:CLUSTER_ARN
-var node_iam_role_arn=arn:NODE_ARN
terraform apply -auto-approve

🔹Deploying Add-ons

Once the clusters are up, we can deploy various add-ons to each cluster to enable features like:

  • AWS Load Balancer Controller
  • Cluster Autoscaler
  • ExternalDNS
  • Calico (for network policies)
  • Kubernetes Dashboard
  • Container Insights

We can deploy these add-ons using Helm Charts or CustomResourceDefinitions. We will need to deploy the add-ons to each cluster individually.

🔹Automating cluster creation

To deploy 100 clusters at once, we will need to automate the cluster creation process.

We can modify the Terraform file with module to create how much count we want.

module "eks_cluster" {
source = "../modules/eks-cluster"

count = 100
name = "cluster-${count.index}"
}

This involves iterating 100 times to create a unique cluster name and deploy the required resources for each iteration.

🔹Cleaning up

When you no longer need the 100 clusters, you can delete or destroy the terraform to clean up all the resources. This ensures nothing is left dangling in your AWS account.

terraform destroy -auto-approve

Using Infrastructure as Code to provision 100 EKS clusters at once helps ensure consistency, repeatability and auditability of your cluster deployments at scale. It also makes future cluster deployments or updates much simpler by reusing the same templates and scripts.

I hope this blog post gave you a good high-level overview of deploying multiple Amazon EKS clusters in an automated and scalable manner. Let me know if you have any other questions!

--

--

Aniket Kharpatil

Final Year Computer Engineering Student having experience in Cloud computing and Mobile app development.