How to Build an ECS + EC2 Auto-Scaling Infrastructure on AWS

13 min readJun 14, 2024

Introduction:

Deploying backend applications at scale is crucial for ensuring consistent performance and availability.

There are multiple ways to deploy Docker containers on AWS, such as using EC2 machines, AWS Lambda, ECS with Fargate, and ECS with EC2. This blog will guide you through setting up an auto-scaling infrastructure using ECS with EC2. We’ll delve into why this choice is optimal for our needs, build the infrastructure using the AWS UI, and in the next blog we’ll manage it with Terraform. For our example, we will be using a Rick Roll Docker image as our backend application. 😎

Overview Diagram of what we are gonna build.

Blog overview:

Various ways to deploy containers
Why ECS + EC2 ?
Working of Infra with Diagram
Flow explanation
Building Infrastructure with the AWS Management Console

Prerequisites:

AWS Account: An active AWS account.
Basic Knowledge of AWS Services: Understanding of EC2, IAM roles, VPCs, subnets, security groups, and Load Balancers.
Familiarity with Docker: Experience with Docker & concept of containerization.
Infrastructure Experience: Previous experience in building infrastructure on AWS or any other cloud provider is beneficial.

Various ways to deploy containers:

Deploying Docker containers can be achieved through various AWS services, each with its pros and cons:

Running containers on a EC2 machine with CI/CD automation: Offers full control but requires more setup and maintenance & it can’t scale automatically, better to use container orchestration service.

2. AWS App Runner:

Pros: Simplified deployment with minimal configuration, automatic scaling.
Cons:
— Max 4 vCPU, 12 GB RAM.
— Scaling based on request count, not CPU or RAM, leading to potential overpayment or performance issues if the load is misestimated.
— Inability to modify the service after creation; must delete and recreate to change configurations.
— Inability to modify the service after creation; must delete and recreate to change configurations.
Why Not: Limited resources and scaling mechanisms make it less ideal for evolving projects with consistent traffic.

3. AWS Lambda: Great for short-lived tasks but has cold start issues and limitations on memory and execution time.

4. ECS : Offers robust container orchestration, automated management, and scaling, providing more flexibility and control compared to EC2 with CI/CD, AWS App Runner, or AWS Lambda.

For those reasons, we are choosing ECS.
ECS can be set up either with EC2 instances or Fargate for serverless deployment.

Why ECS + EC2?

We chose ECS + EC2 for several reasons:

Container Orchestration: ECS provides robust container orchestration capabilities.
Scalability: The ability to auto-scale with high usage.
Consistent Traffic: Suitable for B2B applications with steady traffic, aligning with AWS recommendations.
Avoiding Cold Starts: Unlike serverless options such as AWS Lambda or Fargate, ECS with EC2 does not suffer from cold start delays.
Future-Proofing: ECS + EC2 has the potential to handle GPU workloads and high RAM usage, which services like Lambda, App Runner & even ECS + Fargate cannot support. This makes it a future-proof solution that can adapt to evolving application requirements without major redesigns.
Control and Flexibility: Full control over instance types, scaling policies, and other configurations, ensuring our infrastructure can evolve without major redesigns.
Long-Term Cost Savings: Fargate abstracts the management of EC2 instances, but for consistent traffic, using EC2 instances directly can be more cost-effective in the long term.

Detailed Diagram of our Infra with all resources explicitly mentioned.

Don’t get scared. Breathe in… breathe out. Yeah, that’s right. You’re still here? Cool. To understand this, you can see four major components: the Route 53 part, the ECS part, the Load Balancer part, and the Auto Scaling part. It’s just detailed version of what we already saw earlier.

Everything in green is a resource that we have to define in our infrastructure. Everything in red is a byproduct of it. So, if you are doing infrastructure as code, then you have to define everything in green as a Terraform resource.

Working of Infra:

How ECS + EC2 Auto Scaling Infrastructure Works

Let’s first Start with some basic definition of the resources:

Few images have been added along with definition for better understanding of individual sections

Route 53 Hosted Zone: Manages DNS records for a specific domain.
Route 53 Record: A DNS record to route traffic to your application.

Load Balancer: Distributes incoming application traffic across multiple targets (EC2 instances).
Listener: Defines the protocol and port for connections from clients to the load balancer.
Target Group: A logical grouping of targets (EC2 instances) to route traffic to.

ECS Cluster: A logical grouping of tasks and services.
ECS Service: Manages the deployment and scaling of tasks within the ECS cluster.
Task Definition: Specifies how Docker containers should be launched.

ECS Capacity Provider: Manages the infrastructure to run tasks within the ECS cluster.
Auto Scaling Group: Manages a group of EC2 instances and scales them automatically based on demand.
Launch Template: Provides configuration details for launching EC2 instances.

App Auto Scaling Policy: Defines scaling policies based on metrics like CPU utilization.
App Auto Scaling Target: Specifies which ECS service should be scaled.

Flow Explanation:

DNS Configuration with Route 53:
You have a Route 53 hosted zone for your domain, where you create a DNS record like lb-prod.domain.com. This record directs user traffic to your load balancer.
Traffic Management with Load Balancer:

The load balancer receives traffic from users. It can’t send traffic directly to the EC2 instances running your tasks, so it forwards traffic to a target group.
The listener on the load balancer defines the protocol and port to use for forwarding traffic.

3. Target Group and ECS Service Integration:

The target group is connected to the ECS service, which targets the tasks running on EC2 instances.
The ECS service manages the tasks in your ECS cluster, ensuring they are running as expected.

4. Task Definition and ECS Cluster:

The task definition specifies the details of your Docker containers. The ECS service uses this definition to create tasks.
These tasks run within the ECS cluster, which is a logical grouping of tasks and services.

5. Running Tasks on EC2 Instances:

The tasks need infrastructure to run on, which is provided by EC2 instances.
The auto-scaling group, uses a launch template to automatically launch and manage EC2 instances.

6. ECS Capacity Provider:

The ECS capacity provider manages the infrastructure needed to run tasks. It ensures that the EC2 instances are available for task execution.

7. Auto Scaling Policies:

The app auto-scaling policy is defined with a CPU utilization threshold of 50%. When the average CPU utilization exceeds this threshold, the ECS service with the ECS capacity provider scales up, increasing the number of running tasks.
The auto-scaling group scales the number of EC2 instances to match the demand.

8. App Auto Scaling Target:

The auto-scaling target ensures that the scaling policy is attached to the correct ECS service.

9. ECS Cluster Capacity Provider:

This component keeps track of multiple ECS capacity providers and connects them with the ECS cluster. If no specific capacity provider is connected to a service, it applies a default strategy.

In summary this would be the flow

User traffic is directed to the load balancer via Route 53 subdomain record.
The load balancer forwards traffic to the target group, which routes it to tasks managed by the ECS service.
Tasks run on EC2 instances, managed by the auto-scaling group and ECS capacity provider.
Auto-scaling policies ensure that the infrastructure scales based on demand, maintaining optimal performance.

Building Infra with AWS UI.

We are gonna skip the route 53 part for now as it is optional & mainly come into play if you want same DNS of load balancer on recreation of infra, we would do that when managing it via terraform.

Overview of what we gonna build:

We first build the ECS Cluster
- We will choose EC2 instances
- Create a new auto-scaling group for them
- Define the launch template
— Define the desired capacity
— Type of instance
— AMI
— Storage
— Key pair to use with it
— Subnets
— Security groups
We build the task definition
- Launch type as EC2
- OS
- CPU, RAM requirements
- Task execution role
- We define at least one container with the image to use
- Add volume
- Set the container port
Now we create the ECS service
- Capacity provider strategy
- Task family
- Desired task
- Load balancing (creation of load balancer)
— Application load balancer
— Target group
— Service auto-scaling policy

Steps:

First, log in to your AWS account, search for ECS, and open it.

Let’s start with building the ECS cluster:

Click on “Create Cluster.” Give it a name; I will call it ‘rick-roll-prod-cluster.’ For the naming convention of resources, we will use kebab-case.

2. Scroll down to the Infrastructure section. We can either choose EC2 or Fargate; as discussed earlier, we will be selecting the EC2 launch type.

3. We are going to create an auto-scaling group that will manage the creation and termination of our EC2 instances.

For creating new EC2 instances, we must also specify the launch template that the ASG will use:
AMI: Let’s keep it the same.
Instance type: Let’s go with t2.micro.
EC2 instance role: We will create a new role.
Desired capacity: Set the minimum to 0 and the maximum to 4. These are the limits for the number of instances in our ECS cluster maintained by the ASG.
SSH key: Optionally, create an SSH key to log in to this instance or ignore it.
Volume: Set it to the default value of 30.

4. Now for VPC and Subnets, leave them as default.

5. Create a new security group and add an inbound rule to allow HTTP access from anywhere.

6. You can enable monitoring and add tags if you want. Now, create the cluster.

It may take some time, but now we have successfully created our ECS Cluster with the ASG, EC2 launch template, and security group.

Creation of task definition:

Tasks are like wrappers around containers. We create a task definition, and based on it, our container will run. Click on “Task Definition” on the left-hand side, and then click on “Create New Task Definition.”

2. We give it a name and specify that we want to run it on EC2.

3.Now comes the main decision of Task size. Task size determines how much CPU and memory are required by your task (container).

Set CPU as 1 vCPU and Memory as 0.5 GB, both of which are within the limits of our EC2 hardware (t2.micro).
For OS and Network mode, let’s keep them as default.
Set Task role as `ecsTaskExecutionRole`.

4. In a task, at least one container needs to be specified that will run.

Give the container a name.
For the image, we will use the Rick Roll image available on Docker Hub.
Set the port to 80, which is the port on which Rick Roll will be running.

kale5/rickroll:vclatest

5. Health Check: To monitor the health status of our containers, we need to set up a health check. The ECS service will ping this endpoint repeatedly to ensure the server is working. In our case, we can create a health check that performs a curl to the root endpoint.

6. You can add tags for the task definition if desired. Now, just click on “Create,” and our task will be created.

Now that we have our ECS Cluster and Task Definition, we can start a task on our cluster. However, for automatic management and scaling of our tasks, we will create an ECS service.

Creating ECS Service:

Now, open your ECS cluster, click on the “Service” section, and then click “Create.”

2. For the compute option, we are going with the default settings. A cluster capacity provider and an ECS capacity provider have already been created for us.

3. For the application type, leave it as the default, which is Service.

4. For the Task definition, choose the task family we defined earlier. Its latest revision will be fetched automatically.

5. Give the service a name. In the Desired tasks section, define how many tasks you want our ECS service to maintain in a running state on our ECS cluster. By setting it to 2, we ensure that at least 2 tasks are running at any given time.

6. Now we create our load balancer and connect it to the ECS service to distribute the load on the ECS. Start by selecting “Load balancing — optional.”

In the details, select the load balancer type as ALB. For the container, select “rick-roll-vc-image 80:80.” Create a new Load Balancer and give it a name.

7. Scroll down. Now we create a new listener and target group. The listener will send traffic from the load balancer to the target group, which will target the ECS tasks running on ECS.

8. Now all load balancer-related configurations are complete. Scroll down further. To scale up and down based on load, we need to define an auto-scaling policy for our service.

Open the “Service Auto Scaling” drop-down menu.
Tick the checkbox “Use Service Auto Scaling.”
Define the minimum and maximum number of tasks you want your service to run.

Define the policy name for the auto-scaling policy. In the service metric, we will use Average CPU Utilization as the threshold. You can also choose any other metric upon which the service could scale tasks up or down.
Set the Target value to 50%, which means that at any moment the average CPU utilization goes above this number, we will scale up our systems, and if it goes below, we will scale down.

9. Leave everything else the same, attach a unique tag, and then click “Create” to create our service.

It will take a few minutes for the creation process to complete, so please be patient.

Testing our Infra:

In the AWS Management Console, search for “Load Balancer” and click on it.

2. Now, find the new Application Load Balancer (ALB) you created and open it.

You can copy the DNS record and open it in a new tab. This is the URL to our app (the DNS record of our Application Load Balancer).

You may encounter a 503 error, which indicates that our Load Balancer is working, but the traffic is not being received by the ECS tasks. This could mean that the service is still being created or the tasks have not started running yet.

3. Check your listener, service, target groups, and tasks. It is possible that some of the resources are not fully created yet.

After some time, all resources should be fully created, and you should see healthy tasks running, like this.

Now, refresh the tab where you have your load balancer DNS opened.

Hell yeah, our infrastructure is working! 🎉

Conclusion and Next Steps

You can see that our infrastructure is finally operational. If you send a high volume of traffic, it will automatically scale the tasks running on EC2 instances to meet the demand.

Congratulations on making it this far!

In the next blog, we’ll show you how to convert this infrastructure into code using Terraform. This will allow for easy customization, modifications, and multiple versions of our infrastructure. For now, this is a solid starting point for your journey. If you’re interested in the next part covering Terraform, please leave a comment. Stay tuned, and happy coding!

Wanna connect ? Here’s my Linkedin

Acknowledgments

Yogendra Manawat ( SDE Intern @AiCaller.io ) : Helped in reviewing and perfecting the flow.

Chirag Panjwani ( SDE @ Genesis Technologies ): Provided valuable feedback.

Siddharth Singh Patel : Grateful for suggestions on improvements and ideas such as the addition of prerequisites.