Deploying application infrastructure to dynamically scale and handle high traffic with AWS & Terraform

Aalok Trivedi
15 min readApr 7, 2023

Intro

Infrastructure as Code (IaC) is increasingly becoming a necessity for lightning quick resource deployment, easy maintenance, and high flexibility and modularity. Terraform, one of the leading open-source IaC tools, allows for cloud-agnostic infrastructure configuration with ease. In this article, I’ll show you just how easy it is to use Terraform and AWS to configure application infrastructure that responds to high demand.

Scenario

Our e-commerce web application, Shoppr, is having a major sales event in a couple of weeks. The engineering team wants to be prepared for the anticipated increase in traffic and use Terraform to restructure parts of the infrastructure to manage AWS resource configurations.

The application needs to be highly available and handle times of high AND low traffic without a ton of server overhead and maintenance.

Prior reading: If you haven’t already, I highly recommend reviewing my last article about deploying a Jenkins server with Terraform, as I go over many of the basic Terraform concepts in more detail. Many of the steps we’ll take are very similar and repeatable.

What we’re building

  1. A remote backend for Terraform to store state in S3.
  2. A VPC with two (2) public and two (2) private subnets in two (2) availability zones.
  3. An Application Load Balancer (ALB) deployed in the public subnets that distributes traffic to the ASG.
  4. An Auto Scaling Group (ASG) that deploys web server EC2 instances in the private subnets and dynamically scales based on demand.

Prerequisites

  1. An AWS account with IAM user access.
  2. Foundational knowledge of Linux systems and commands.
  3. Foundational knowledge of AWS resources, such as VPCs, subnets, security groups, Auto Scaling Groups, S3, etc…
  4. Foundational knowledge of Terraform basics.
  5. Access to a command line tool.
  6. An IDE, such as VS Code or Cloud9.

GitHub repo

Here is a link to my repo if you want to follow along ➡️

🚀 🚀 🚀 🚀 🚀 Let’s get started!

Set up a remote backend

By default, Terraform creates the backend locally and stores state in our working directory. This can cause issues when trying to work with team members because they won’t have access to the most current state file. However, we can create a remote backend with S3 and DynamoDB so Terraform can control the state remotely and securely in the cloud.

I’ve detailed how to create a remote backend in a previous article, so I highly recommend taking these steps before moving forward.

Configure the providers

After the remote backend has been configured, let’s configure Terraform and establish the providers we’ll need (AWS).

All of our variables will be stored in a variables.tf file.

#--variables.tf--

# environment vars
#---------------------------------------
variable "environment" {
type = string
default = "Dev"
}

variable "aws_region" {
type = string
default = "us-east-1"
}
# naming vars
#---------------------------------------
variable "app_name" {
type = string
description = "app name prefix for naming"
default = "shoppr"
}

In a providers.tf file, we’ll configure the backend, required providers, and AWS provider. We’ll also get all the Availability Zones in our region (us-east-1) with a data block.

#--providers.tf--

terraform {
backend "s3" {
bucket = "shoppr-remote-backend"
key = "global/s3/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "shoppr-tfstate-locking"
encrypt = true
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "4.60.0"
}
random = {
source = "hashicorp/random"
version = "3.4.3"
}
}
}

provider "aws" {
region = var.aws_region
default_tags {
tags = {
App = var.app_name
Environment = var.environment
Terraform = "True"
}
}
}

# Retrieve the list of AZs in the current AWS region
data "aws_availability_zones" "available" {}
data "aws_region" "current" {}

Initialize the directory

We need to run terraform init to initialize the backend and providers, and prepare the working directory for Terraform.

NOTE: For organization purposes, we’ll put all the variables in a single variables.tf file, and all the resources in their own .tf files (vpc.tf, subnets.tf, route_tables.tf, etc…).

Create the network base

What we‘ll need

  1. A VPC.
  2. Two (2) public and two (2) private subnets.
  3. An Internet Gateway.
  4. A NAT Gateway.
  5. A public and private route table.

VPC

#--variables.tf--

# vpc vars
#----------------------------------------
variable "vpc_cidr" {
type = string
description = "VPC cidr block"
default = "10.0.0.0/16"
}

variable "enable_dns_hostnames" {
type = bool
description = "enable dns hostnames"
default = true
}
#--vpc.tf--

# Define the VPC
#----------------------------------------------------
resource "aws_vpc" "vpc" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = var.enable_dns_hostnames

tags = {
Name = "${var.app_name}-vpc"
}
}

Subnets

#--variables.tf--

# public subnet vars
#----------------------------------------
variable "public_subnets" {
default = {
"public-subnet-1" = 0
"public-subnet-2" = 1
}
}
variable "auto_ipv4" {
type = bool
description = "enable auto-assign ipv4"
default = true
}

# private subnet vars
#----------------------------------------
variable "private_subnets" {
default = {
"private-subnet-1" = 0
"private-subnet-2" = 1
}
}
#--subnets.tf--

# deploy subnets
#----------------------------------------------------
# deploy the public subnets
resource "aws_subnet" "public_subnets" {
for_each = var.public_subnets
vpc_id = aws_vpc.vpc.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, each.value + 100)
availability_zone = tolist(data.aws_availability_zones.available.names)[each.value]

map_public_ip_on_launch = var.auto_ipv4

tags = {
Name = "${var.app_name}-${each.key}"
Subnet = "Public"

}
}

# deploy the private subnets
resource "aws_subnet" "private_subnets" {
for_each = var.private_subnets
vpc_id = aws_vpc.vpc.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, each.value + 1)
availability_zone = tolist(data.aws_availability_zones.available.names)[each.value]

tags = {
Name = "${var.app_name}-${each.key}"
Subnet = "Private"
}
}

We’ll iterate over the subnet vars using the for_each key so we can create multiple resources from one declaration.

The cidrsubnet() function dynamically generates the CIDR blocks within the VPC CIDR, and we can use the AZ data from the earlier data block to place them in their respective AZs.

  • public-subnet-1: 10.0.100.0/24 / private-subnet-1: 10.0.1.0/24= us-east-1a
  • public-subnet-2: 10.0.101.0/24 / private-subnet-2: 10.0.2.0/24= us-east-1b

Plan & apply

Let’s terraform plan and terraform apply what we have so far.

VPC
Public subnets
Private subnets

Internet gateway (IGW) & NAT gateway

We’ll create an IGW and NAT so instances in the public and private subnets can access the internet.

#--gateways.tf--

# Create Internet Gateway
#----------------------------------------------------
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.vpc.id
tags = {
Name = "${var.app_name}-igw"
}
}

The NAT needs an elastic IP, so we’ll create a resource for the eip and allocate the id to the NAT. We’ll also place it in one of the public subnets.

#--gateways.tf--

# Create NAT Gateway
#----------------------------------------------------

# Create elastic IP
resource "aws_eip" "eip" {
vpc = true
}

resource "aws_nat_gateway" "nat_gw" {
depends_on = [aws_internet_gateway.igw]
allocation_id = aws_eip.eip.id
subnet_id = aws_subnet.public_subnets["public-subnet-1"].id

tags = {
Name = "${var.app_name}-nat"
}
}

Public and private route tables

Now let’s create our route tables and route associations. We’ll need a public and private route table and associate them with the public and private subnets.

We’ll use the VPC’s default route table as the public_rt and add a route to direct all traffic to the IGW.

#--route_tables.tf--

# Edit default route table for public subnets
#----------------------------------------------------
# Public rt
resource "aws_default_route_table" "public_rt" {
default_route_table_id = aws_vpc.vpc.default_route_table_id

route {
cidr_block = var.all_traffic
gateway_id = aws_internet_gateway.igw.id
}
tags = {
Name = "${var.app_name}-public-rt"
}
}

# Public rt assoc.
resource "aws_route_table_association" "public" {
depends_on = [aws_subnet.public_subnets]
route_table_id = aws_default_route_table.public_rt.id
for_each = aws_subnet.public_subnets
subnet_id = each.value.id
}

We’ll create a new private_rt and add a route to direct all traffic to the NAT.

#--route_tables.tf--

# Route table for private subnets
#----------------------------------------------------
# Private rt
resource "aws_route_table" "private_rt" {
vpc_id = aws_vpc.vpc.id

route {
cidr_block = var.all_traffic
nat_gateway_id = aws_nat_gateway.nat_gw.id
}
tags = {
Name = "${var.app_name}-private-rt"
}
}

# Private rt assoc.
resource "aws_route_table_association" "private" {
depends_on = [aws_subnet.private_subnets]
route_table_id = aws_route_table.private_rt.id
for_each = aws_subnet.private_subnets
subnet_id = each.value.id
}

Plan & apply

Let’s plan and apply route tables and associations.

Public and private route tables.
Public and private route table associations.

Create the security groups

We’ll create three security groups to allow proper inbound/outbound traffic:

  1. One to allow for SSH.
  2. One to allow inbound HTTP/HTTPS traffic (we’ll attach this to the ALB).
  3. One to allow inbound traffic from the ALB (we’ll attach this to the ASG instances).

SSH security group

#--variables.tf--

# security group vars
#----------------------------------------
variable "ssh_sg" {
type = map(any)
description = "ssh security group vars"
default = {
"cidr_block" = "0.0.0.0/0"
"timeout_delete" = "2m"
}
}
#--security_groups.tf--

# ssh security group
#----------------------------------------------------
resource "aws_security_group" "ssh_sg" {
name = "${var.app_name}-ssh-sg"
description = "Allow ssh"
vpc_id = aws_vpc.vpc.id

ingress {
description = "ssh from IP"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [var.ssh_sg["cidr_block"]]
}

lifecycle {
create_before_destroy = true
}

timeouts {
delete = var.ssh_sg["timeout_delete"]
}

tags = {
Name = "${var.app_name}-ssh-sg"
}
}

Because resources are highly dependent on security groups, AWS will not allow us to delete them if they have an association. This can cause deletion issues when Terraform tries to change or delete security groups. The create_before_destroy argument tells Terraform to create the new/updated group before destroying the old one to ensure any associations with resources are not broken. for more on this topic, visit the official documentation.

Disclaimer: Generally, you want to lock down the SSH IP address to an authorized CIDR range or secure address, but for demo purposes, we’ll leave it to “anywhere” or your local machine’s "<public_IP>/32".

Web access security group

We need to allow inbound HTTP/HTTPS traffic to the ALB.

variable "web_access_sg" {
type = map(any)
description = "web access security group vars "
default = {
"timeout_delete" = "2m"
}
}
#--security_groups.tf--

# Web access security group
#----------------------------------------------------
resource "aws_security_group" "web_access_sg" {
name = "${var.app_name}-web-access-sg"
description = "Allow http/https traffic to alb"
vpc_id = aws_vpc.vpc.id

# http
ingress {
description = "http"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = [var.all_traffic]
}
# https
ingress {
description = "https"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [var.all_traffic]
}
#all outbound traffic
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = [var.all_traffic]
ipv6_cidr_blocks = ["::/0"]
}

lifecycle {
create_before_destroy = true
}

timeouts {
delete = var.web_access_sg["timeout_delete"]
}


tags = {
Name = "${var.app_name}-web-access-sg"
}
}

ASG instance security group

We need to allow inbound HTTP/HTTPS traffic from the ALB to our instances. Instead of a cidr_block, we’ll use security_groups as the source and input the web_access_sg ID.

resource "aws_security_group" "alb_access_sg" {
name = "${var.app_name}-alb-access-sg"
description = "Allow http traffic from ALB"
vpc_id = aws_vpc.vpc.id

# http
ingress {
description = "allow traffic from alb sg"
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.web_access_sg.id]
}

egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = [var.all_traffic]
ipv6_cidr_blocks = ["::/0"]
}

lifecycle {
create_before_destroy = true
}

timeouts {
delete = var.alb_access_sg["timeout_delete"]
}

tags = {
Name = "${var.app_name}-alb-access-sg"
}
}

Plan & apply

You know the deal…

Create the Application Load Balancer (ALB)

Before we create our ASG, let’s create the ALB.

In times of high traffic volume, The ASG will scale our web servers based on demand, but we need a way to efficiently distribute incoming traffic to the proper targets (in our case, the web server EC2 instances).

What we’ll need

  1. An ALB to distribute the traffic among the web servers.
  2. A target group that targets our web server EC2 instances.
  3. A listener to listen for HTTP traffic on port 80 so the ALB can direct traffic to the target group.

ALB

First, let’s create the ALB resource

#--variables.tf--

# ALB vars
#----------------------------------------
variable "load_balancer_name" {
type = string
description = "load balancer name"
default = "web-alb"
}

variable "load_balancer_internal" {
type = bool
description = "Is looad balancer internal facing?"
default = false
}

variable "load_balancer_type" {
type = string
description = "load balancer"
default = "application"
}
#--load_balancer.tf--

# ALB
#----------------------------------------
resource "aws_lb" "web_alb" {
name = "${var.app_name}-${var.load_balancer_name}"
internal = var.load_balancer_internal
load_balancer_type = var.load_balancer_type
security_groups = [aws_security_group.web_access_sg.id]
subnets = [for subnet in aws_subnet.public_subnets : subnet.id]

tags = {
Name = "${var.app_name}-${var.load_balancer_name}"
}
}

We want the ALB to be internet-facing, so we’ll set the internal argument to false. We’ll also attach the web_access_sg security group and iterate over the aws_subnet resource to place the ALB in the public subnets.

Target group

We need a target group to point traffic to the ASG web server instances. The aws_lb_target_group resource needs to direct HTTP traffic on port 80 in our VPC.

#--variables.tf--

# ALB target group vars
#----------------------------------------
variable "web_alb_tg_http" {
type = map(any)
description = "http target group vars"
default = {
"name" = "web-server-tg-http"
"port" = 80
"protocol" = "HTTP"
}
}
#--load_balancer.tf--

# ALB target group
#----------------------------------------
# HTTP
resource "aws_lb_target_group" "web_alb_tg_http" {
depends_on = [aws_lb.web_alb]
name = "${var.app_name}-${var.web_alb_tg_http["name"]}"
port = var.web_alb_tg_http["port"]
protocol = var.web_alb_tg_http["protocol"]
vpc_id = aws_vpc.vpc.id
}

ALB listener

Lastly, we need a listener to detect HTTP traffic on port 80 and connect it to the target group and ALB. We’ll attach the listener to the ALB and forward the traffic to the target group.

#--variables.tf--

# ALB listener vars
#----------------------------------------
variable "web_alb_listener_http" {
type = map(any)
description = "http listener vars"
default = {
"port" = 80
"protocol" = "HTTP"
"action_type" = "forward"
}
}
#--load_balancer.tf--

# ALB listeners
#----------------------------------------
# HTTP
resource "aws_lb_listener" "web_alb_listener_http" {
load_balancer_arn = aws_lb.web_alb.arn
port = var.web_alb_listener_http["port"]
protocol = var.web_alb_listener_http["protocol"]

default_action {
type = var.web_alb_listener_http["action_type"]
target_group_arn = aws_lb_target_group.web_alb_tg_http.arn
}
}

Note: If you have HTTPS certificates and want to allow HTTPS traffic, you’ll need an additional target group and listener that listens for HTTPS on port 443.

Plan & apply

Plan and apply the resources. We’ll test the ALB when we create our ASG.

Target group and listener
ALB

Create the EC2 launch template

Alright! Our network base, security groups, and ALB are configured and deployed, so let’s set up our web servers. Before we create our ASG, we need a template that will define the type of EC2 instances to launch.

We’ll be using Amazon Linux 2, t2-micro instances bootstrapped with Apache.

Apache script

Let’s create a separate install_httpd.sh file that contains our script to install an Apache web server on each instance (the index.html code is optional. The designer in me can’t resist being extra fancy).

#!/bin/bash
sudo yum update -y
sudo yum install -y httpd && sudo yum enable httpd
sudo yum start httpd

sudo echo '<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>A Basic HTML5 Template</title>
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link
href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;700;800&display=swap"
rel="stylesheet"
/>
<link rel="stylesheet" href="css/styles.css?v=1.0" />
</head>
<body>
<div class="wrapper">
<div class="container">
<h1>Welcome! An Apache web server has been started successfully.</h1>
<p>Replace this with your own index.html file in /var/www/html.</p>
</div>
</div>
</body>
</html>
<style>
body {
background-color: #34333d;
display: flex;
align-items: center;
justify-content: center;
font-family: Inter;
padding-top: 128px;
}
.container {
box-sizing: border-box;
width: 741px;
height: 449px;
display: flex;
flex-direction: column;
justify-content: center;
align-items: flex-start;
padding: 48px 48px 48px 48px;
box-shadow: 0px 1px 32px 11px rgba(38, 37, 44, 0.49);
background-color: #5d5b6b;
overflow: hidden;
align-content: flex-start;
flex-wrap: nowrap;
gap: 24;
border-radius: 24px;
}
.container h1 {
flex-shrink: 0;
width: 100%;
height: auto; /* 144px */
position: relative;
color: #ffffff;
line-height: 1.2;
font-size: 40px;
}
.container p {
position: relative;
color: #ffffff;
line-height: 1.2;
font-size: 18px;
}
</style>
' > /var/www/html/index.html

Launch template

The aws_launch_template resource takes in most of the same arguments as a regular aws_instance. We’ll create vars for each of the necessary arguments and insert them into the resource. Don’t forget to add both the ssh_sg and alb_access_sg security group IDs.

#--variables.tf--

# ec2 vars
#----------------------------------------
variable "web_server_name" {
type = string
default = "web-server"
}
variable "web_server_ami" {
type = string
description = "Instance AMI: Amazon Linux 2"
default = "ami-04581fbf744a7d11f"
}
variable "web_server_type" {
type = string
default = "t2.micro"
}

variable "key_pair" {
type = string
description = "ec2 key pair"
default = "YOUR_KEY_PAIR"
}

variable "user_data_file" {
type = string
description = "user data file path"
default = "install_httpd.sh"
}
#--ec2_launch_template.tf--

# EC2 web launch template
#----------------------------------------
resource "aws_launch_template" "web_server" {
name = "${var.app_name}-${var.web_server_name}-template"
image_id = var.web_server_ami
instance_type = var.web_server_type
user_data = filebase64("${path.module}/${var.user_data_file}")
key_name = var.key_pair
vpc_security_group_ids = [aws_security_group.ssh_sg.id, aws_security_group.alb_access_sg.id]

tags = {
"Name" = "${var.app_name}-${var.web_server_name}-template"
}
}

Plan & apply

Create Auto Scaling Group (ASG)

Now that our launch template is complete, we can create our ASG to launch our web server instances and have it scale based on demand.

ASG

Our ASG will have a min_sizeof 2, and max_size of 5.

#--variables.tf--

# ASG vars
#----------------------------------------
variable "web_asg_capacity" {
type = map(any)
description = "min, max, and desired instance capacity"
default = {
"min" = 2
"max" = 5
}
}
variable "web_asg_health_check_type" {
type = string
description = "health check type"
default = "ELB"
}
#--auto_scaling_group.tf--

# Auto scaling group
#----------------------------------------
resource "aws_autoscaling_group" "web_server_asg" {
name = "${var.app_name}-web-server-asg"
min_size = var.web_asg_capacity["min"]
max_size = var.web_asg_capacity["max"]
vpc_zone_identifier = [for subnet in aws_subnet.private_subnets : subnet.id]
target_group_arns = [aws_lb_target_group.web_alb_tg_http.arn]
health_check_type = var.web_asg_health_check_type

launch_template {
id = aws_launch_template.web_server.id
version = "$Latest"
}

tag {
key = "Name"
value = "${var.app_name}-${var.web_server_name}-asg"
propagate_at_launch = true
}
}

The vpc_zone_indentifier takes a list of subnet IDs for the ASG to deploy instances into. We can iterate over our aws_subnet.private_subnets to get the ASG to span across many AZs and place instances in those subnets.

Don’t forget to add the target_group_arns from the ALB target group.

We’ll also assign our web-server launch-template to the ASG.

Scaling policy

Lastly, let’s add a scaling policy to determine how to scale the web servers. We want to scale up instances when the average CPU utilization goes above 75%.

  • policy_type = “TargetTrackingScaling”
  • metric_type = “ASGAverageCPUUtilization”
  • target_value = 75.0
#--variables.tf--

variable "web_asg_scaling_policy" {
type = map(any)
description = "scaling policy"
default = {
"policy_type" = "TargetTrackingScaling"
"metric_type" = "ASGAverageCPUUtilization"
"target_value" = 75.0
}
}
#--auto_scaling_group.tf--

# Scaling policy
resource "aws_autoscaling_policy" "web_server_asg" {
depends_on = [aws_autoscaling_group.web_server_asg]
name = "${var.app_name}-asg-scaling-policy"
policy_type = var.web_asg_scaling_policy["policy_type"]
autoscaling_group_name = aws_autoscaling_group.web_server_asg.name

target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = var.web_asg_scaling_policy["metric_type"]
}

target_value = var.web_asg_scaling_policy["target_value"]
}
}

Plan & apply

Let’s deploy our ASG with terraform plan and terraform apply.

Testing!

We’ve got all of our resources deployed, so let’s test if it works!

To test if the ALB is properly distributing traffic to the ASG instances, we need the ALB dns_name. We can either get this from the Console… OR… we can directly output this information to the CLI.

Outputs

An output block allows us to output useful resource information such as IDs, names, ARNs, etc… when deployed. This is super useful for creating modules or just getting common info.

In a new outputs.tf file, we’ll output the public dns_name of the ALB. We can even use string interpolation to create a clickable link!

#--outputs.tf--

output "alb_dns" {
description = "ALB public DNS"
value = "http://${aws_lb.web_alb.dns_name}"
}

Let’s also output our VPC and subnet IDs.

#--outputs.tf--

output "vpc_id" {
description = "VPC ID"
value = aws_vpc.vpc.id
}

output "public_subnets" {
description = "Public subnets"
value = [for subnet in aws_subnet.public_subnets : subnet.id]
}

output "private_subnets" {
description = "Private subnets"
value = [for subnet in aws_subnet.private_subnets : subnet.id]
}

If we run a terraform apply we should see all of our outputs in the CLI.

Click the link or paste the address into a browser.

Success!

Our Shoppr application now has the proper infrastructure to scale up to handle high traffic volume AND to scale back down to save costs in times of low volume.

Because our infrastructure is configured in Terraform, maintaining and updating the application is now a breeze!

Here is the GitHub repo again.

Thank you

Thank you for following me on my cloud engineering journey. I hope this article was helpful and informative. Please give me a like & follow as I continue my journey, and I will share more articles like this!

Feel free to connect with me on LinkedIn!

--

--