Deploying a native Kubernetes Cluster on AWS using Terraform

Michael Hannecke
Bluetuple.ai
Published in
14 min readAug 22, 2023
The Author with Dall-E3

Kubernetes Cluster Setup from Scratch on AWS with Terraform for Learning and Testing

Kubernetes is a powerful container orchestration platform that can be used to deploy, scale, and manage containerized applications. However, setting up Kubernetes can be a complex task. In this blog post, we will walk you through the process of setting up an unmanaged Kubernetes cluster on AWS using Terraform.

Terraform is an Infrastructure as Code (IaC) tool that allows you to automate the creation, deployment, and management of infrastructure resources. In this case, we will use Terraform to create a Kubernetes cluster on AWS.

What are the benefits of setting up Kubernetes from scratch?

There are some benefits to setting up Kubernetes from scratch. First, it gives you complete control over the cluster. You can choose the underlying infrastructure, the networking configuration, and the security settings. Second, it is a good learning experience, especially if you’re planning to undergo official Kubernetes certification like CKA or CKAD. By setting up Kubernetes from scratch, you will gain a deeper understanding of how the platform works.

What are the prerequisites for this tutorial?

To follow this tutorial, you will need the following:

  • An AWS account with proper rights to configure VPC , Security groups, Route tables and AWS Instances
  • Terraform CLI installed on your device
  • SSH configured on your device. For this tutorial I assume your public ssh key is in ‘ ~/.ssh/id_rsa.pub’
  • A basic understanding of Kubernetes, AWS and Terraform will be helpful as well knowledge about basic linux shell usage.

This tutorial is divided into several steps. Each step has detailed instructions on how to complete the task. I’d recommend to follow all steps in the given order to ensure the cluster will come up as planned.

Disclaimer

This tutorial is for learning and testing purposes only. It is not an ideal setup for production workloads. For production or even commercial use cases, it is recommended to use a managed Kubernetes environment.

Furthermore running the infrastructure on AWS will incur costs on your aws account — make sure to destroy everything once you’re done with testing.

For this tutorial we’ll have all scripts in a dedicated folder, so if you’re run a ‘terraform destroy’ in this folder, everything you’ve setup following this tutorial will be deleted.

Always check before you go to avoid unnecessary costs.

Let’s dive in

Source code on github: https://github.com/bluetuple/terraform-aws

1. Terraform Initialization

To enable terraform managing AWS infrastructure for you, you should have the AWS cli installed and configured on your device by running

aws configure

in a terminal. At least the following environment variables should be set:

export AWS_ACCESS_KEY_ID=<your-aws-access-key>
export AWS_SECRET_ACCESS_KEY=<your-aws-secret>
export AWS_DEFAULT_REGION=<your-preferred-region>

Create a new subfolder and place all upcoming files in this folder.

main.tf

At first create a file named main.tf and place the the AWS provider information in:

#main.tf

# Declare the AWS provider configuration
provider "aws" {
# Specify the AWS region using a variable
region = var.k8-region
}

variable.tf

Next, create a file for all variable definition. We will use variables for the Region the infrastructure will be deployed to (“k8-region”), The CIDR for our VPC and Subnet (“k8-vpc-cidr”, “k8-subnet-cidr”), the external IP from which we will allow access to the cluster — this should be the external IP of your environment (“external-ip”). You can get this info easily by :

curl ipinfo.io

Furthermore we use a variable to define the instance typ for the master and worker nodes (“instance_type”). Last but not least, a variable for the amount of deployed workers (“workers-count”).

The variables file should look like this:

#variables.tf

# Define a variable for the default region where the infrastructure will be provisioned
variable "k8-region" {
type = string
description = "Default Region"
}

# Define a variable for the CIDR block of the main VPC
variable "k8-vpc-cidr" {
type = string
description = "CIDR block of main VPC"
}

# Define a variable for the CIDR block of the first subnet
variable "k8-subnet-cidr" {
type = string
description = "CIDR block of the first subnet"
}

# Define a variable for the external IP range
variable "external_ip" {
type = string
description = "Our external IP"
default = "0.0.0.0/0" # Default value set to allow all IPs
}

# Define a variable for the instance type used for both Kubernetes nodes and master
variable "instance_type" {
type = string
description = "Instance type for Kubernetes nodes and master"
default = "t2.micro" # Default instance type set to t2.micro
}

# Define a variable for the number of Kubernetes worker nodes
variable "workers-count" {
type = number
default = 2
description = "Number of Kubernetes worker nodes"
}

terraform.tfvars

To initialize the variables above, create a file terraform.tfvars and put in values fitting to your setup:

#terraform.tvfars
k8-region = "<your-region"
k8-vpc-cidr = "10.0.0.0/16"
k8-subnet-cidr = "10.0.1.0/24"
external_ip = "<your external ip>"
instance_type = "t2.medium"
workers-count = 2

outpupt-tf

To interact with the kubernetes nodes via ssh, we need to know the assigned IP addresses. Terraform can output these values after the infrastructure had been deployed to AWs. Therefore create a file named output.tf and place the following code in:

# output.tf
# Define an output to display the public IP address of the Kubernetes master node
output "Kubernetes-Master-Node-Public-IP" {
value = aws_instance.k8-master.public_ip # Retrieve the public IP of the Kubernetes master node
}

# Define an output to display a map of worker node IDs to their public IP addresses
output "Kubernetes-Worker-nodes-Public-IP" {
value = {
for instance in aws_instance.k8-node : # Loop through each worker node instance
instance.id => instance.public_ip # Create a map entry: instance ID => public IP
}
}

2. Network Configuration

We now will define the base network environment. For our use case we will place all nodes within one subnet. For ease of the setup the following rules will be implemented:

  • All nodes and master will be with in the same subnet
  • All VMs can communicate with each other unrestricted. For more security you should limit the communication to the required ports kubernetes requires, but for this setup we will skip that.
  • All VMs can reach the internet for updates and download of required software packages. This should also be limited down in any production environment, but is out of scope here.
  • The VMS will be reachable via SSH from your external IP address only. For a more secure approach, all VMs should be placed in a provate subnet and only be reachable via some bastion host or similar, but again, for the sake of our test environment I’ll keep it more simple.

networks.tf

Create a file named networks.tf and copy the following code into it:

# networks.tf
# Create the main public VPC for Kubernetes
resource "aws_vpc" "vpc_kubernetes" {
cidr_block = var.k8-vpc-cidr # Define the IP address range for the VPC
enable_dns_support = true # Enable DNS support for the VPC
enable_dns_hostnames = true # Enable DNS hostnames for the VPC

tags = {
Name = "kubernetes-vpc" # Assign a name tag to the VPC
}
}

# Create an Internet Gateway for VPC connectivity
resource "aws_internet_gateway" "igw-kubernetes" {
vpc_id = aws_vpc.vpc_kubernetes.id # Attach the Internet Gateway to the Kubernetes VPC
tags = {
Name = "kubernetes-vpc-igw" # Assign a name tag to the Internet Gateway
}
}

# Get a list of available Availability Zones in the VPC region
data "aws_availability_zones" "azs" {
state = "available"
}

# Create a subnet in the VPC's first Availability Zone
resource "aws_subnet" "subnet1" {
availability_zone = element(data.aws_availability_zones.azs.names, 0) # Use the first AZ in the region
vpc_id = aws_vpc.vpc_kubernetes.id # Attach to the Kubernetes VPC
cidr_block = var.k8-subnet-cidr # Define the subnet's IP range
}

# Create a route table for internet access
resource "aws_route_table" "kubernetes-internet-route" {
vpc_id = aws_vpc.vpc_kubernetes.id # Associate the route table with the Kubernetes VPC

route {
cidr_block = "0.0.0.0/0" # Route all traffic to the internet
gateway_id = aws_internet_gateway.igw-kubernetes.id # Use the Internet Gateway for the route
}

lifecycle {
ignore_changes = all # Ignore changes in the route table's lifecycle
}

tags = {
Name = "KubernetesRouteTable" # Assign a name tag to the route table
}
}

# Associate the route table with the VPC's main route table
resource "aws_main_route_table_association" "kubernetes-set-rt-to-vpc" {
vpc_id = aws_vpc.vpc_kubernetes.id # Associate with the Kubernetes VPC
route_table_id = aws_route_table.kubernetes-internet-route.id # Use the previously created route table
}

This definition will create a couple of resources:

  • Resource “aws_vpc” “vpc_kubernetes
    The main VPC
  • Resource "aws_internet_gateway" "igw-kubernetes"
    An internet gateway in our VPC to enable communication to the internet
  • Resource “aws_subnet” “subnet1
    This will create the subnet for all of our workloads. As this subnet will be created in a availability zone within our region, we’re using a data provider (“aws_availability_zones”) that will provide a list of available AZs; the subnet then will be created in the first AZ.
  • Resource “aws_route_table” “kubernetes-internet-route
    This route table will tell the nodes how to reach the internet.
  • Resource “aws_main_route_table_association” “kubernetes-set-rt-to-vpc
    The route has to be associated with the VPC to be active

securitygroups.tf

After base network definition is done, we have to provide needed firewall rules, in AWS called security groups. We need a file securitygroups.tf with following contend:

# securitygroups.tf
# Create an AWS security group for the Kubernetes cluster
resource "aws_security_group" "k8cluster-sg" {
name = "k8cluster-sg"
description = "Allows incoming SSH and outgoing to all ports"
vpc_id = aws_vpc.vpc_kubernetes.id

# Allow all outgoing traffic to the internet
egress {
description = "Allows all ports to the internet"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}

# Allow incoming ICMP traffic from the specified CIDR block
ingress {
description = "ICMP"
from_port = -1
to_port = -1
protocol = "icmp"
cidr_blocks = [var.k8-vpc-cidr]
}

# Allow incoming TCP traffic within the specified CIDR block
ingress {
description = "TCP internal"
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [var.k8-vpc-cidr ]
}

# Allow incoming SSH traffic from the specified external IP address
ingress {
description = "Allow SSH"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [var.external_ip]
# Uncomment the line below to allow SSH from any IP (not recommended for security)
# cidr_blocks = ["0.0.0.0/0"]
}
}

3. Master and Worker Nodes

instances.tf

The stage is set now, time to bring in the actors. Create a file instances.tf within the master and worker nodes are beeing defined as follows:

# instances.tf
# Create an AWS SSH key pair for authentication
resource "aws_key_pair" "ssh_key" {
key_name = "kubernetes-key" # Key pair name
public_key = file("~/.ssh/id_rsa.pub") # Use the public key from the local SSH keypair
}

# Provision the Kubernetes master node
resource "aws_instance" "k8-master" {
ami = "ami-020b33a9e86370158" # Ubuntu 20.04 LTS AMI ID
instance_type = var.instance_type # Instance type specified in variables
key_name = aws_key_pair.ssh_key.key_name # Use the AWS SSH key
associate_public_ip_address = true # Associate a public IP
vpc_security_group_ids = [aws_security_group.k8cluster-sg.id] # Use the Kubernetes security group
subnet_id = aws_subnet.subnet1.id # Use the specified subnet

# Use a startup script for configuring the master node
user_data = file("startup-master.sh")

tags = {
Name = "kubernetes-master"
}

depends_on = [aws_main_route_table_association.kubernetes-set-rt-to-vpc]
}

# Provision Kubernetes worker nodes
resource "aws_instance" "k8-node" {
count = var.workers-count # Create the specified number of worker nodes
ami = "ami-020b33a9e86370158" # Ubuntu 20.04 LTS AMI ID
instance_type = var.instance_type # Instance type specified in variables
key_name = aws_key_pair.ssh_key.key_name # Use the AWS SSH key
associate_public_ip_address = true # Associate a public IP
vpc_security_group_ids = [aws_security_group.k8cluster-sg.id] # Use the Kubernetes security group
subnet_id = aws_subnet.subnet1.id # Use the specified subnet

# Use a startup script for configuring the worker nodes
user_data = file("startup-worker.sh")

tags = {
Name = join("-", ["kubernetes-node", count.index + 1]) # Create unique names for each node
}
# ensure required network settings are deployed beforehand
depends_on = [aws_main_route_table_association.kubernetes-set-rt-to-vpc]
}

This file mainly come with three parts:

  • At first terraform will create a key pair for the instances based on the public key
  • Second, definition of the master node. For this test environment the same instance type will be used for both the master and the worker nodes.
  • Third the deployment of multiple worker nodes, defined by the corresponding variable.

Important for the deployment is the used AWA AMI image. For this tutorial we’re using Ubuntu 20.04 LTS as the base image. You can get the valid AMI-ID for your region from this website. Be careful to adapt as needed, the ami-id within the above script may NOT be available in your selected region

There are other ways to identify an valid AMI id during run time, maybe I’ll write a dedicated post about that approach later on. for now we’ll go with the more ‘manual’ approach.

The last part is about the definition of the workers The count variable here will inform terraform to setup up the required count of workers you’ve defined in the terraform.tfvars file.

startup-master.sh

To install the designated master note we have to provide a startup-script, which will be executed one the VM will startup initially. Provide a file called “startup-master.sh” with the following content. Be aware that. the naming must match with the filename provided in the resource definition for the master node above (user_data = file(“startup-worker.sh”)”!

# startup-master.sh

#!/bin/bash

# Load necessary kernel modules for containerd
cat << EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Configure kernel networking requirements for Kubernetes
cat << EOF | sudo tee /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sudo sysctl --system

# Update package information and install containerd
sudo apt-get update && sudo apt-get install -y containerd

# Generate and set containerd configuration
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml

# Restart containerd with the new configuration
sudo systemctl restart containerd

# Disable swap to meet Kubernetes requirements
sudo swapoff -a
sudo apt-get update && sudo apt-get install -y apt-transport-https curl

# Add Kubernetes apt repository and GPG key
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmour -o /usr/share/keyrings/kubernetes.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/kubernetes.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list

# Update package information and install specific Kubernetes components
sudo apt-get update && sudo apt-get install -y kubelet=1.28.0-00 kubeadm=1.28.0-00 kubectl=1.28.0-00

# Prevent automatic updates for Kubernetes components
sudo apt-mark hold kubelet kubeadm kubectl

# Initialize the Kubernetes control plane with specified settings
sudo kubeadm init --pod-network-cidr 192.168.0.0/16 --kubernetes-version 1.28.0

# Configure kubeconfig for the current user
mkdir -p /home/ubuntu/.kube >> /var/log/startup.log 2>&1
sudo cp -i /etc/kubernetes/admin.conf /home/ubuntu/.kube/config >> /var/log/startup.log 2>&1
sudo chown $(id -u ubuntu):$(id -g ubuntu) /home/ubuntu/.kube/config >> /var/log/startup.log 2>&1

# Install Calico network plugin for pod networking
sudo -u ubuntu /usr/bin/kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml >> /tmp/calico-setup.log

# Generate join command for worker nodes
sudo kubeadm token create --print-join-command >> /tmp/join-command.txt

This script will configure the master VM for kubernetes, start the master node inside and install a network plugin. We’ll go with the calico plugin.

To Install the calico plugin we’ll call inside the setup script:

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

It would be possible to install this directly on the master node as part of the kubernetes administration, but I would like to automate that step a bit further.

The worker nodes require a slightly different start-up script. Create a file named “startup-worker.sh” Again, ensure that the name exactly fits the user_data definition in the worker node resource definition in instances.tf.

startup-worker.sh

startup-worker.sh

#!/bin/bash

# Load necessary kernel modules for containerd
cat << EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Configure kernel networking requirements for Kubernetes
cat << EOF | sudo tee /etc/sysctl.d/99-kubernetes.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sudo sysctl --system

# Update package information and install containerd
sudo apt-get update && sudo apt-get install -y containerd

# Generate and set containerd configuration
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml

# Restart containerd with the new configuration
sudo systemctl restart containerd

# Disable swap to meet Kubernetes requirements
sudo swapoff -a
sudo apt-get update && sudo apt-get install -y apt-transport-https curl

# Add Kubernetes apt repository and GPG key
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmour -o /usr/share/keyrings/kubernetes.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/kubernetes.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list

# Update package information and install specific Kubernetes components
sudo apt-get update && sudo apt-get install -y kubelet=1.28.0-00 kubeadm=1.28.0-00 kubectl=1.28.0-00

# Prevent automatic updates for Kubernetes components
sudo apt-mark hold kubelet kubeadm kubectl

4. Startup

With everything above in place, we are now ready to ignite the rocket.

Run a

terraform init

to allow terraform to download the required provider. For this tutorial we store the state file locally — if you want to have a remote state, please the my post about remote state file definition:

Next, run “terraform fmt” and “terraform validate” to optimize formatting and ensure that the configuration is understandable for terraform. If anything is exactly as provided above there should be nor errors and we now can start the deployment:

terraform plan
terraform apply

This will take a couple of minutes to perform. after terraform has finished the deployment it will output the IP addresses of the deployed instances in the terminal.

5. Add the worker nodes to the cluster

You should now have a master node and one ore more worker nodes up and running, but there still some additional steps required to join the worker nodes in kubernetes to the cluster:

Wait a couple of minutes as the installation inside the nodes take a bit (two to three minutes should be sufficient) and then connect via ssh to your master node

ssh ubuntu@<IP-master-node>

Once logged in to the master node, it should look something like this:

We now can check, if the calico plugin was deployed successful (it may take a bit for the pods to start up), but soon there should be two calico pods and the coredns pods should be in running state as well:

Now we have to get the parameter to join the nodes to our kubernetes cluster. On the master node, carry out

sudo kubeadm token create --print-join-command

Copy the output of this command into clipboard, we hsave to execute exact this command on each cluster node:

Login with ssh and the by terraform provided IP addresses to each node and carry out the copied command:

Back on the master node, check that each node has successfully been added:

kubectl get nodes

# for more detailed information:

kubectl get pod -A

Now you have a functional kubernetes cluster ready for your container deployments

Well Done!

6. Terraform destroy

Once your done with testing, do not forget to run a

terraform destroy

in the same folder all the above configuration is held, this will destroy all nodes, the network configuration and firewall setup and to avoid additional costs!

7. Conclusion

Congratulations on setting up your unmanaged Kubernetes cluster on AWS using Terraform! This gives you full control over your cluster configuration and management. To ensure the security and performance of your cluster, it is important to keep your cluster and its components updated. You may also need to consider additional aspects like logging, monitoring, high availability, and backup in a production environment.

I hope this tutorial has been helpful. For more information on Kubernetes, please visit the following resources:

Kubernetes documentation: https://kubernetes.io/docs/home/

Terraform documentation: https://www.terraform.io/docs/

AWS documentation on Kubernetes: https://docs.aws.amazon.com/eks/latest/userguide/

Happy container orchestration! 🚀

Time to get a coffee.

--

--