HumanGov: Proof of Concept (POC) on AWS Elastic Container Service (ECS) fronted By Application Load Balancer (ALB) and Storing Docker Images on Elastic Container Registry

Francisco Güemes
14 min readFeb 1, 2024

--

During this hands-on project, I delved into the intricate world of cloud architecture and deployment. My task revolved around the HumanGov application, which was to be modernized, containerized, and deployed onto the AWS Cloud environment. I started by creating necessary IAM roles, followed by initializing the AWS Cloud 9 environment and cleaning up outdated directories. This set the stage for our application transition, as I then containerized the HumanGov application using Docker. Next, I built and pushed our Docker images to AWS ECR for both our Flask app and the NGINX application. The meticulous step-by-step process ensured each layer was built properly with the right configurations.

Using Terraform, I provisioned an AWS S3 Bucket and a DynamoDB table. Terraform’s Infrastructure as Code capability streamlined this process, allowing me to instantiate, modify, and version infrastructure safely and efficiently. With AWS S3 and DynamoDB ready, it was then onto the Elastic Container Service (ECS). After creating an ECS cluster, I formulated a Task Definition for our ECS deployment.

This definition detailed the containers, their roles, and necessary configurations, from environment variables to port settings.

Finally, it was time for deployment. I set up a service on ECS, using Fargate as the launch type. The service seamlessly integrated our Docker containers, AWS configurations, and the necessary resources to ensure our application’s optimal performance. The results were a fully functional, scalable, and resilient cloud-based HumanGov application.

Overall, this project gave me an in-depth understanding of cloud deployment processes, from IAM roles, Docker containerization, AWS ECR, Terraform provisioning, to AWS ECS service deployment. Each step was a lesson in best practices, troubleshooting, and the immense potential of cloud technologies.

Introduction

HumanGov is a Software as a Service (SaaS) Cloud Company that develops the Human Resources Management Cloud SaaS application for the Department of Education across all 50 states in the US.

Until now there were some restrictions for the development of the project:

  • No pooled tenants allowed at this time by States’ IT Divisions
  • Usage of containers are not allowed due to they have to be homologated first by the States’ IT Divisions.

The second just changed and now containers have been homologated therefore the previous version of the architecture needed to be modernized. The previous architecture was based on EC2 instances where the application ran directly. This architecture presented some problems such as:

  • Lack of High Availability: There were only one EC2 instance running for each customer (US state that participate in the project), so if there were a problem with the instance, the application was down for that specific customer.
  • EC2 Management Overhead: Using EC2 instances implies that the management of the instances (Updating, pathing, etc…) is a responsibility of the HumanGov staff.
  • Slow App delivery Process: The deployment of the application is not automatize, so it is done manually.
  • Compatibility issues with Developer’s environments: Developers use different OS, drivers, configuration, etc than the EC2 instances that execute the application in production environment.

In order to solve all these problems from the old architecture I proposed the following POC architecture based on AWS ECS:

The POC architecture based on AWS ECS has a load balancer to re-route the traffic to any of the ECS Task contents to the other in case of any problem, so the high availability is achieved. On top of that each ECS Task content runs on a different AZ (Availability Zone), so it is guaranteed by AWS that the application will be running in case of any specific problem related to one of the data-centers.

Since the POC architecture is based on AWS ECS, now the management of the infrastructure that runs the application is not responsibility of the HumanGov staff, so the team can focus on providing new functionalities and improvements to the customers.

Last but not least, with this new architecture based on Docker containers, the developers can run the application on their laptops using the exact same containers as in production. This not only speeds up the development process; since developers have the certainty that they can catch problems on early stage, it also allows the developers to troubleshoot production problems on on their laptops by spinning the exact same containers.

Create IAM Role

I needed to create an IAM Role to give permissions to the application to access resources such as S3 and DynamoDB from within ECS Tasks. The role is also needed to create ECS (Service and Tasks) for the application.

I created a role named HumanGovECSExecutionRole and I assigned to it the following permissions:

AmazonS3FullAccess 
AmazonDynamoDBFullAccess
AmazonECSTaskExecutionRolePolicy

Containerize the application

For containerizing the HumanGov application I used Docker. The application is divided in different applications that run independently:

  • The HumanGov Application: Is the application that runs the business logic. It is developed using Python.
  • The NGINX application: Is a reverse proxy based on NGINX.

Since the application is divided in two different parts, I needed to containerize each part separately. That means that for each part: I created a Dockerfile, then I created a repository in AWS ECR and finally I pushed the docker to AWS ECR (Elastic Container Registry).

Containerize the HumanGov application

I created the following Dockerfile :

# Use Python as a base image
FROM python:3.8-slim-buster

# Set working directory
WORKDIR /app

# Copy requirements and install dependencies
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Copy the Flask application
COPY . /app

# Start Gunicorn
CMD ["gunicorn", "--workers", "1", "--bind", "0.0.0.0:8000", "humangov:app"]

After creating the Dockerfile, I created a new public ECR repository named humangov-app using the AWS UI:

When I finished creating the repository, I entered in the repository and click on the button View push commands. This will show to execute in your machine in order to push images into the repository.

Following the instructions from the AWS ECR repository that I created I pushed the image into the repository.

Containerize the NGINX application

For conteinarizing the NGINX application I followed a similar process than for containerazing the HumanGov Application, but first I needed to create a couple of configuration files nxinx.conf and proxy_params in order to tweak the default NGINX settings.

After that initial step, the process is the same as for containerazing HumanGov: I created a Dockerfile, then I created a repository in AWS ECR and finally I pushed the docker to AWS ECR (Elastic Container Registry).

I created the following Dockerfile :

# Use NGINX alpine as a base image
FROM nginx:alpine

# Remove the default NGINX configuration file
RUN rm /etc/nginx/conf.d/default.conf

# Copy custom configuration file
COPY nginx.conf /etc/nginx/conf.d

# Copy proxy parameters
COPY proxy_params /etc/nginx/proxy_params

# Expose port 80
EXPOSE 80

# Start NGINX
CMD ["nginx", "-g", "daemon off;"]

After creating the Dockerfile, I created a new public ECR repository named humangov-nginx using the AWS UI:

When I finished creating the repository, I entered in the repository and click on the button View push commands. This will show to execute in your machine in order to push images into the repository.

Following the instructions from the AWS ECR repository that I created I pushed the image into the repository.

At the end of the containerization process you must have two repositories with an image on each repository.

Provision the infrastructure

The project uses infrastructure such as S3 buckets and DynamoDB tables (among other elements) that needed to be provisioned on AWS. For provisioning this cloud resources I used terraform. To speed up the process I took the following trade offs:

  • I took as IaC base the terraform resources definition from the previous project and I commented out the part related to the provisioning of the EC2 instances. I do not need that infrastructure for this project.
  • Since this is a POC project and the purpose is to verify the validity of the architecture, I only need to provision resources of one US State, so I chose to use california as example.

Prerrequisites

In order to use terraform within Cloud9 I needed to use some credentials required by terraform that were not managed by Cloud9. This generates a conflict of authority among parts, so the best solution I found for this problem was to disable the AWS managed credentials (Settings — > AWS Settings) on Cloud9.

The next step is to use the credentials assigned to the Cloud9 user. In my previous project, in the section Environment Setup, I explained how to create the user, create the credentials and download the credentials file.

In my case I am going to use the credentials from my previous project, since the Cloud9 user ( cloud9-user ) is the same and I still have the credentials file in my machine cloud9-user_accessKeys.csv .

I opened a terminal in Cloud9 and executed the following commands. If you are following this guide, please do not forget to substitute with your values:

export AWS_ACCESS_KEY_ID="your_credentials_id"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
echo $AWS_ACCESS_KEY_ID
echo $AWS_SECRET_ACCESS_KEY

Adapt terraform defiinition

As I said previously, I needed to comment out the resource blocks related with the EC2 instances since they are not needed anymore. I modified the following files:

modules/aws_humangov_infrastructure/main.tf
modules/aws_humangov_infrastructure/output.tf
terraform/outputs.tf

modules/aws_humangov_infrastructure/main.tf

/*
resource "aws_security_group" "state_ec2_sg" {
name = "humangov-${var.state_name}-ec2-sg"
description = "Allow traffic on ports 22 and 80"

ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

ingress {
from_port = 5000
to_port = 5000
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

ingress {
from_port = 0
to_port = 0
protocol = "-1"
security_groups = ["sg-016ddfeacf9322bc8"]
}

egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}

tags = {
Name = "humangov-${var.state_name}"
}
}

resource "aws_instance" "state_ec2" {
ami = "ami-007855ac798b5175e"
instance_type = "t2.micro"
key_name = "humangov-ec2-key"
vpc_security_group_ids = [aws_security_group.state_ec2_sg.id]
iam_instance_profile = aws_iam_instance_profile.s3_dynamodb_full_access_instance_profile.name

provisioner "local-exec" {
command = "sleep 30; ssh-keyscan ${self.private_ip} >> ~/.ssh/known_hosts"
}

provisioner "local-exec" {
command = "echo ${var.state_name} id=${self.id} ansible_host=${self.private_ip} ansible_user=ubuntu us_state=${var.state_name} aws_region=${var.region} aws_s3_bucket=${aws_s3_bucket.state_s3.bucket} aws_dynamodb_table=${aws_dynamodb_table.state_dynamodb.name} >> /etc/ansible/hosts"
}

provisioner "local-exec" {
command = "sed -i '/${self.id}/d' /etc/ansible/hosts"
when = destroy
}


tags = {
Name = "humangov-${var.state_name}"
}
}
*/

resource "aws_dynamodb_table" "state_dynamodb" {
name = "humangov-${var.state_name}-dynamodb"
billing_mode = "PAY_PER_REQUEST"
hash_key = "id"

attribute {
name = "id"
type = "S"
}

tags = {
Name = "humangov-${var.state_name}"
}
}

resource "random_string" "bucket_suffix" {
length = 4
special = false
upper = false
}


resource "aws_s3_bucket" "state_s3" {
bucket = "humangov-${var.state_name}-s3-${random_string.bucket_suffix.result}"
# acl = "private"

tags = {
Name = "humangov-${var.state_name}"
}
}

resource "aws_s3_bucket_ownership_controls" "state_s3" {
bucket = aws_s3_bucket.state_s3.id
rule {
object_ownership = "BucketOwnerPreferred"
}
}


resource "aws_s3_bucket_acl" "state_s3" {
depends_on = [aws_s3_bucket_ownership_controls.state_s3]

bucket = aws_s3_bucket.state_s3.id
acl = "private"
}
/*
resource "aws_iam_role" "s3_dynamodb_full_access_role" {
name = "humangov-${var.state_name}-s3_dynamodb_full_access_role"

assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
}
]
}
EOF

tags = {
Name = "humangov-${var.state_name}"
}

}

resource "aws_iam_role_policy_attachment" "s3_full_access_role_policy_attachment" {
role = aws_iam_role.s3_dynamodb_full_access_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"

}

resource "aws_iam_role_policy_attachment" "dynamodb_full_access_role_policy_attachment" {
role = aws_iam_role.s3_dynamodb_full_access_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess"

}

resource "aws_iam_instance_profile" "s3_dynamodb_full_access_instance_profile" {
name = "humangov-${var.state_name}-s3_dynamodb_full_access_instance_profile"
role = aws_iam_role.s3_dynamodb_full_access_role.name

tags = {
Name = "humangov-${var.state_name}"
}
}
*/

modules/aws_humangov_infrastructure/output.tf

/*
output "state_ec2_public_dns" {
value = aws_instance.state_ec2.public_dns
}
*/

output "state_dynamodb_table" {
value = aws_dynamodb_table.state_dynamodb.name
}

output "state_s3_bucket" {
value = aws_s3_bucket.state_s3.bucket
}

terraform/outputs.tf

output "state_infrastructure_outputs" {
value = {
for state, infrastructure in module.aws_humangov_infrastructure :
state => {
#ec2_public_dns = infrastructure.state_ec2_public_dns
dynamodb_table = infrastructure.state_dynamodb_table
s3_bucket = infrastructure.state_s3_bucket
}
}
}

Provision infrastructure using terraform

The next step was to provision the AWS DynamoDB and S3 bucket using the Terraform configuration description that I modified in the previous section.

I only wanted to provision the resources for the state of california so I decided to use the option -var to override in the terminal the terraform’s configuration variable that contains the states. I executed the following commands:

cd /home/ec2-user/environment/human-gov-infrastructure/terraform

terraform show
terraform validate
terraform plan -var 'states=["california"]'
terraform apply -var 'states=["california"]'

Finally I copied the output from terraform, because I needed it later on for configuring the containers. You can see below the information that I kept about the infrastructure. Note that I stored directly in OS environment variables format. The reason for it is that it will ease the use of it in later steps.

US_STATE=california
AWS_REGION=us-east-1
AWS_DYNAMODB_TABLE=humangov-california-dynamodb
AWS_BUCKET=humangov-california-s3-zknx

Create ECS cluster

The next step is to create a new ECS cluster named humangov-cluster using the UI Console. The key parts is to use as infrastructure the AWS Fargate (serverless). See the image below:

Once the cluster is created, it is necessary to create a task definition with two different containers and a service that will run it all.

Create Task definition

Remember that the task definition must have two different containers, one per application.

I created the task definition with the following parameters:

- Task definition family: humangov-fullstack
- Launch type: AWS Fargate
- OS: Linux/x86/64
- Task size: 1 vCPU / 3 GB
- Task role: HumanGovECSExecutionRole
- Task execution role: HumanGovECSExecutionRole

I added the first container (for the HumanGov app) with these parameters:

- Container 1:
- Name: app
- Image URI: <copy from ECR humangov-app repository>
- Essential container: yes
- Container port: 8000
- Protocol: TCP
- Port name: Keep auto-generated
- App protocol: HTTP
- Environment variables (replace the variables values with your own values):
- AWS_BUCKET = humangov-california-s3-zknx
- AWS_DYNAMODB_TABLE = humangov-california-dynamodb
- AWS_REGION = us-east-1
- US_STATE = california

I added the second container (for nginx) with the following parameters:

- Container 2:
- Name: nginx
- Image URI: <copy from ECR humangov-nginx repository>
- Essential container: no
- Add port mapping:
- Container port: 80
- Protocol: TCP
- Port name: Keep auto-generated
- App protocol: HTTP

Create Service

When the newly task definition was created, I created a service for the task definition in order to run the task definition.

I created the service with the following parameters:

- Existing cluster: humangov-cluster
- Compute options: Launch type
- Launch type: Fargate
- Platform version: latest
- Application type: Service
- Service name: humangov
- Desired tasks: 2

In the networking section I added the following configuration

- Networking:
- VPC: Default
- Security Group: Default (make sure it allow traffic on Port 80)

I opened my default security group ( sg-092744305bba52c3 ) in a different tab and I ensured that I had an inbound rule like to the one below, in order to allow incoming the traffic to the port 80.

Once I ensure the rule was in place, I came back to the service creation and I added a load balancer to the service with the following configuration:

- Load Balancing:
- Load balancer type: Application Load Balancer
- Create a new load balancer
- Load balancer name: humangov-lb
- Choose container to load balance: nginx 80:80
- Listener: create new listener
- Port: 80 / Protocol: HTTP
- Target group:
- Target group name: humangov-tg
- Health check path: /

The creation of the cluster took a few minutes, but when it finished I could verify that I had two task instances running in my ECS service instance.

I also could verify that each task instance was running in a different availabilty zone

Test the application

In order to test the application I entered in the newly created service instance in order to open the load balancer URL in a new tab

After seeing the application up and running I proceeded to verify basic functionality of the application such as creating a new employee, upload a document and verify that the employee data was stored in the DynamoDB table and the document was present in the S3 bucket.

Summary

The present article showcase how I implemented a POC architecture based on docker containers that are deployed in a fully managed AWS infrastructure using AWS ECR and AWS ECS.

The POC served to validate the improvement proposal over the previous architecture based on EC2 instances. The new proposed architecture brings many new improvements to the project such as high availability and a higher fault tolerance (each task instance runs in a different availability zone).

Finally the article showcases the solutions to some common problems that arise when using Cloud9 as development environment and it is combined with other tools such as terraform.

--

--

Francisco Güemes

Java Back End Developer with focus on Cloud & Devops |AWS | Microsoft Azure | Google Cloud | Oracle Cloud