Deploying a Production-Ready Amazon EKS Cluster using Terraform
Deploying a production-ready Amazon EKS cluster often requires a lot of time and effort in creating the cluster, and node groups, deploying the Add-ons, and configuring the additional Security Groups.
In this article, we will use Terraform to create the Amazon EKS Cluster along with the required Add-ons.
The Terraform template creates the following
· Create Required Subnets and Route Tables
· EKS cluster with Managed Node Group
· Enable Cluster Auto-Scaler
· Deploy ALB Ingress controller for L4-L7 application load balancing
· Deploy EFS CSI Driver for Application persistent storage
· Deployed Prometheus and Grafana for monitoring
Proposed Deployment Architecture:
Prerequisites:
We need the following resources created as an on-time configuration.
1. Create the EKS Cluster VPC
2. Configure VPC
a. Create the default public subnet
b. Create Internet Gateway and update the default route table
c. Create the NAT gateway
d. Create the Private Subnet Route table and update routing route table
3. Configure VPC Peering
a. EKS VPC to Tools VPC
b. EKS VPC to Shared Services VPC
4. Update the EKS VPC’s default route table with the routes for accessing the Tools and Shared Services VPC
5. Update the EKS VPC’s private route table with the routes for accessing the Tools and Shared Services VPC
6. Create a new SSH key pair or use the existing key pair
7. Create an S3 bucket for storing the terraform state backups
8. Get the EKS worker node AMI-Id from the official Ubuntu on Amazon Elastic Kubernetes Service (EKS) website — https://cloud-images.ubuntu.com/docs/aws/eks/
a. Select the AMI Id based on the EKS version and region, This AMI will be used to provision the EKS cluster worker nodes
10. Crated the required IAM policy to be used as a Service Account by the Application Pods
Terraform modules used from the open-source community
1. EKS — https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/18.29.0
2. AWS EKS Cluster-Autoscaler — https://github.com/DNXLabs/terraform-aws-eks-cluster-autoscaler
3. AWS ALB Ingress Controller — https://github.com/kubernetes-sigs/aws-load-balancer-controller
4. AWS EKS EFS CSI Driver — https://github.com/DNXLabs/terraform-aws-eks-efs-csi-driver
AWS EKS Grafana Prometheus — https://github.com/DNXLabs/terraform-aws-eks-grafana-prometheus
Custom terraform modules and other resources:
1. Subnets — Used to create required Private and Public Subnets for launching the EKS cluster
2. IAM — Used to create the required IAM roles and Policies for automating the deployment of add-ons using terraform Kubernetes provider
Note: All the above public modules used in this project are downloaded and used as a local module
Terraform files and folder structure:
The following screenshot shows the folder structure that we have used in our environment, the GitHub link to the entire project is available in the later section of this blog
Main.tf
provider "aws" {
region = "${var.AWS_REGION}"
}
resource "random_id" "worker-nodes" {
keepers = {
# Generate a new id each time we switch to a new AMI id
ami_id = "${var.ami_id}"
}
byte_length = 4
}
locals {
name = "${var.cluster_name}"
cluster_version = "${var.cluster_version}"
region = "${var.AWS_REGION}"
}
#Subnet module
module "subnets" {
source = "./modules/subnets"
vpc_id = var.vpc_id
route_table_id = var.route_table_id
private_subnet_1_cidr = var.private_subnet_1_cidr
private_subnet_2_cidr = var.private_subnet_2_cidr
private_subnet_3_cidr = var.private_subnet_3_cidr
public_subnet_1_cidr = var.public_subnet_1_cidr
public_subnet_2_cidr = var.public_subnet_2_cidr
public_subnet_3_cidr = var.public_subnet_3_cidr
cluster_name = var.cluster_name
tag_shutdown = var.tag_shutdown
tag_poc = var.tag_poc
tag_customer = var.tag_customer
tag_env = var.tag_env
tag_platform_type = var.tag_platform_type
}
################################################################################
# EKS Module
################################################################################
module "eks" {
source = "./modules/eks"
cluster_name = local.name
cluster_version = local.cluster_version
vpc_id = "${var.vpc_id}"
subnets = module.subnets.private_subnets
cluster_endpoint_private_access = true
cluster_endpoint_public_access = false
cluster_create_security_group = false
cluster_security_group_id = aws_security_group.controlplane.id
manage_aws_auth=false
enable_irsa = true
node_groups = {
node_group1 = {
name_prefix = "${var.node_group1_prefix}"
desired_capacity = 1
max_capacity = 2
min_capacity = 1
disk_size = 150
disk_type = "gp2"
launch_template_id = aws_launch_template.worker-node-group-1.id
launch_template_version = aws_launch_template.worker-node-group-1.default_version
instance_types=["t2.xlarge"]
capacity_type = "ON_DEMAND"
bootstrap_env = {
CONTAINER_RUNTIME = "containerd"
USE_MAX_PODS = false
}
additional_tags = {
"node-Application"=true
Name = "${var.node_group1_prefix}${random_id.worker-nodes.hex}"
Description = "eks cluster node group created by terraform"
SHUTDOWN = "${var.tag_shutdown}"
POC = "${var.tag_poc}"
CUSTOMER = "${var.tag_customer}"
ENV = "${var.tag_env}"
PLATFORM_TYPE = "${var.tag_platform_type}"
}
}
node_group2 = {
name_prefix = "${var.node_group2_prefix}"
desired_capacity = 1
max_capacity = 2
min_capacity = 1
disk_size = 150
disk_type = "gp2"
launch_template_id = aws_launch_template.worker-node-group-2.id
launch_template_version = aws_launch_template.worker-node-group-2.default_version
instance_types=["t2.xlarge"]
capacity_type = "ON_DEMAND"
bootstrap_env = {
CONTAINER_RUNTIME = "containerd"
USE_MAX_PODS = false
}
additional_tags = {
"node-Application"=true
Name = "${var.node_group2_prefix}${random_id.worker-nodes.hex}"
Description = "eks cluster node group created by terraform"
SHUTDOWN = "${var.tag_shutdown}"
POC = "${var.tag_poc}"
CUSTOMER = "${var.tag_customer}"
ENV = "${var.tag_env}"
PLATFORM_TYPE = "${var.tag_platform_type}"
}
}
}
tags = {
Name = local.name
Description = "eks cluster created by terraform"
SHUTDOWN = "${var.tag_shutdown}"
POC = "${var.tag_poc}"
CUSTOMER = "${var.tag_customer}"
ENV = "${var.tag_env}"
PLATFORM_TYPE = "${var.tag_platform_type}"
}
}
Variables.tf
variable "AWS_REGION" {
default = "us-west-2"
}
variable "vpc_id" {
type = string
default = ""
}
variable "tools_vpc_id" {
default = ""
}
variable "platform_vpc_id" {
default = ""
}
variable "ami_id" {
default=" "
}
variable "ssh_access_key" {
default = ""
}
variable "route_table_id" {
default = ""
}
variable "node_group1_prefix" {
default = "workers-APP1-"
}
variable "node_group2_prefix" {
default = "workers-APP2-"
}
variable "cluster_name" {
default = "eks-dev"
}
variable "cluster_version" {
default = "1.20"
}
variable "private_subnet_1_cidr" {
default = "172.16.2.0/24"
}
variable "private_subnet_2_cidr" {
default = "172.16.3.0/24"
}
variable "private_subnet_3_cidr" {
default = "172.16.4.0/24"
}
variable "public_subnet_1_cidr" {
default = "172.16.5.0/24"
}
variable "public_subnet_2_cidr" {
default = "172.16.6.0/24"
}
variable "public_subnet_3_cidr" {
default = "172.16.7.0/24"
}
#Tags
variable "tag_shutdown" {
default = "Never"
}
variable "tag_poc" {
default = "test@example.com"
}
variable "tag_customer" {
default = "Internal"
}
variable "tag_env" {
default = "DEV"
}
variable "tag_platform_type" {
default = "APP"
}
variable "namespace" {
default = "application"
}
variable "service_account_name" {
default = "ekssa"
}
variable "ApplicationsAccess_arn" {
default = "arn:aws:iam::XXXXXXXXXXX:policy/ApplicationsAccess"
}
backend.tf
Update this file and securely store the terraform states in the S3 Bucket.
AWS_Region = Region where the EKS will be provisioned
Bucket = s3 bucket name created in the previous step for storing terraform state
Key = terraform/eks-cluster-state-cluster- <<environment name>>
e.g., terraform/eks-cluster-state-cluster-app-dev
terraform {
backend "s3" {
bucket="eks-cluster-tf-state-backup"
key="terraform/eks-cluster-state-cluster-dev"
region="us-west-2"
}
}
Terraform Execute
After updating the above input files execute the below terraform command sequentially
Terraform init command will initialize the provider and other modules used in his automation.
terraform init
Terraform plan will dry run the template and show the expected changes will be made to the infrastructure
terraform plan
Terraform apply will apply the changes to the infrastructure.
terraform apply
Enter “Yes” to accept the changes.
Wait for the operation to complete,
Note: The provisioning will take up to 30 minutes. Please watch the console logs for the progress.
Accessing provisioned EKS cluster:
Set the kube config in the bastion host using the below command line to run Kubectl command
“aws eks update-kubeconfig --region <<region name>> --name -eks-<<cluster_name>>”
Example- “aws eks update-kubeconfig --region us-west-2 --name eks-dev”
GitHub Repository: