Blue/Green Deployment on AWS ECS With Terraform

Wambui Gitau
6 min readJul 18, 2023

--

Blue/Green deployment strategy means having two identical but separate environments. The blue environment is running the current version of the application and the green is running the new version of the application. Why is Blue/Green deployment important? It minimises downtime of applications during deployment. Applications on ECS tasks take sometime to deploy because of the draining process of ECS services.

How does it work?

The main concept of Blue/Green deployment is the shift of traffic from the current running application to the new version of the application. For this to be accomplished on AWS, there has to be 2 target groups and a load balancer. During deployment, the following steps take place:

  • A new environment similar to the current running environment is created. The new environment is in target group 2
  • The new version of the application is deployed to the new environment
  • The new version of the application is tested using traffic testing. This is to make sure that the application is running well
  • The traffic is shifted from the current version to the new version
  • The old environment is deleted.
Blue/Green Concept
After deployment

ECS Architecture

For an application deployed on ECS Cluster, we use AWS CodeDeploy for Blue/Green Deployment. An example of an architecture of a service and task on ECS Cluster looks like the diagram below:

AWS ECS Architecture Example

NB: Assumption is that you have a VPC already setup with public subnets

The resources used are:

  • Application Load balancer for directing traffic to the different ports.
variable "elb_sg_ingress_ports" {
type = list(number)
default = [80, 443, 8080]
}

resource "aws_security_group" "application_elb_sg" {
vpc_id = var.vpc_id
name = "application_elb_sg"
}

resource "aws_security_group_rule" "application_elb_sg_ingress" {
count = length(var.elb_sg_ingress_ports)
type = "ingress"
from_port = var.elb_sg_ingress_ports[count.index]
to_port = var.elb_sg_ingress_ports[count.index]
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
security_group_id = aws_security_group.application_elb_sg.id
}
resource "aws_lb" "app_lb" {
name = "application_load_balancer"
load_balancer_type = "application"
subnets = var.public_subnets[*].id
idle_timeout = 60
security_groups = [aws_security_group.application_elb_sg.id]
}
  • Two target groups, blue and green
variable "lb_target_group_name" {
type = string
default = "tg"
}

locals {
target_groups = [
"green",
"blue",
]
}

resource "aws_lb_target_group" "tg" {
count = length(local.target_groups)

name = "${var.lb_target_group_443_name}-${element(local.target_groups, count.index)}"
port = 443
protocol = "HTTP"
target_type = "instance"
vpc_id = var.vpc_id
health_check {
matcher = "200,301,302,404"
path = "/"
}

}
  • AWS Listeners: We will have 3 listeners — 443 which will be the one users communicate with, 80 as the main port and 8080 as the alternative port
resource "aws_alb_listener" "l_80" {
load_balancer_arn = aws_lb.app_lb.arn
port = "80"
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}

resource "aws_alb_listener" "l_8080" {
load_balancer_arn = aws_lb.app_lb.id
port = 8080
protocol = "HTTP"

default_action {
type = "forward"
target_group_arn = aws_lb_target_group.tg[1].arn
}
}

resource "aws_alb_listener" "l_443" {
load_balancer_arn = aws_lb.app_lb.arn
port = "443"
protocol = "HTTPS"
certificate_arn = XXXX
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.tg[0].arn
}
depends_on = [aws_lb_target_group.tg]

lifecycle {
ignore_changes = [default_action]
}
}

Port 80 redirects traffic to port 443. This is the HTTPS link that the users will be accessing. Traffic from 443 is forwarded to port 80. When the new environment is created, the port used will be 8080, the traffic will be forward to port 80.

  • ECS Cluster, ECS Service and ECS Task Definition. Any environment variables that will be used by the Task will be defined in the Task Definition. For this example, the container will be on EC2, an alternative would have been AWS Fargate.
resource "aws_ecr_repository" "app_ecr_repo" {
name = "app-ecr-repository"
force_delete = true
}

resource "aws_ecs_cluster" "app_cluster" {
name = "application_cluster"
}

resource "aws_ecs_service" "frontend" {
name = "frontend"
cluster = aws_ecs_cluster.app_cluster.id
task_definition = aws_ecs_task_definition.frontend_task.arn
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
health_check_grace_period_seconds = 300
launch_type = "EC2"
scheduling_strategy = "REPLICA"
desired_count = 1


force_new_deployment = true
load_balancer {
target_group_arn = aws_lb_target_group.tg[0].arn
container_name = "app"
container_port = "80" # Application Port
}
deployment_controller {
type = "CODE_DEPLOY"
}


# workaround for https://github.com/hashicorp/terraform/issues/12634
depends_on = [aws_lb.app_cluster]
# we ignore task_definition changes as the revision changes on deploy
# of a new version of the application
# desired_count is ignored as it can change due to autoscaling policy
lifecycle {
ignore_changes = [task_definition, desired_count, load_balancer]
}
}


resource "aws_ecs_task_definition" "frontend_task" {
family = "frontend-task"
container_definitions = jsonencode([{


name = "app",
image = "${var.aws_account_id}.dkr.ecr.${var.aws_account_region}.amazonaws.com/app-ecr-repository:<revision_number>",
essential = true,
portMappings = [
{
"containerPort" : 80 # Application Port
}
],




logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.main.name
awslogs-stream-prefix = "ecs"
awslogs-region = var.region
}
}
}])
requires_compatibilities = ["EC2"] # Stating that we are using ECS Fargate # Using awsvpc as our network mode as this is required for Fargate
memory = 1800 # Specifying the memory our container requires
cpu = 512 # Specifying the CPU our container requires
execution_role_arn = aws_iam_role.app_task_role.arn

}
  • AWS IAM Role
resource "aws_iam_role" "app_task_role" {
name = "app-task-role"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow",
Principal = {
Service = "ecs-tasks.amazonaws.com"
},
Action = "sts:AssumeRole"
}
]
})
}

resource "aws_iam_role_policy_attachment" "ECS_task_execution" {
role = aws_iam_role.app_task_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

Deployment

To deploy the application, AWS CodeDeploy will be used. CodeDeploy enables deployment using Blue/Green deployment. CodeDeploy handles the process of creation of the new task and assign the target group, test the traffic of the new application, shift the traffic and delete the old task.

ECS Architecture after Deployment
AWS CodeDeploy Deployment

CodeDeploy will need permissions to perform the above tasks. On the code section, the permissions needed are stated.

To shift the traffic, there are multiple ways of doing this and can be configured on CodeDeploy. One of the way is ECSAllAtOnce, this means that the traffic will be transferred all at once. Other options can be found here.

resource "aws_codedeploy_app" "frontend" {
compute_platform = "ECS"
name = "frontend-deploy"
}
resource "aws_codedeploy_deployment_group" "frontend" {
app_name = aws_codedeploy_app.frontend.name
deployment_group_name = "frontend-deploy-group"
deployment_config_name = "CodeDeployDefault.ECSAllAtOnce"
service_role_arn = aws_iam_role.codedeploy.arn

blue_green_deployment_config {
deployment_ready_option {
action_on_timeout = "CONTINUE_DEPLOYMENT"
}

terminate_blue_instances_on_deployment_success {
action = "TERMINATE"
termination_wait_time_in_minutes = 1
}
}

ecs_service {
cluster_name = aws_ecs_cluster.app_cluster.name
service_name = aws_ecs_service.frontend.name
}

deployment_style {
deployment_option = "WITH_TRAFFIC_CONTROL"
deployment_type = "BLUE_GREEN"
}
auto_rollback_configuration {
enabled = true
events = ["DEPLOYMENT_FAILURE"]
}

load_balancer_info {
target_group_pair_info {
prod_traffic_route {
listener_arns = [aws_alb_listener.l_443.arn]
}

target_group {
name = aws_lb_target_group.tg[0].name
}

target_group {
name = aws_lb_target_group.tg[1].name
}


}
}

}

data "aws_iam_policy_document" "assume_by_codedeploy" {
statement {
sid = ""
effect = "Allow"
actions = ["sts:AssumeRole"]

principals {
type = "Service"
identifiers = ["codedeploy.amazonaws.com"]
}
}
}

resource "aws_iam_role" "codedeploy" {
name = "codedeploy"
assume_role_policy = data.aws_iam_policy_document.assume_by_codedeploy.json
}


data "aws_iam_policy_document" "codedeploy" {
statement {
sid = "AllowLoadBalancingAndECSModifications"
effect = "Allow"

actions = [
"ecs:CreateTaskSet",
"ecs:DeleteTaskSet",
"ecs:DescribeServices",
"ecs:UpdateServicePrimaryTaskSet",
"elasticloadbalancing:DescribeListeners",
"elasticloadbalancing:DescribeRules",
"elasticloadbalancing:DescribeTargetGroups",
"elasticloadbalancing:ModifyListener",
"elasticloadbalancing:ModifyRule",
"s3:GetObject"
]

resources = ["*"]
}
statement {
sid = "AllowPassRole"
effect = "Allow"

actions = ["iam:PassRole"]

resources = [
aws_iam_role.app_task_role.arn
]
}

statement {
sid = "DeployService"
effect = "Allow"

actions = ["ecs:DescribeServices",
"codedeploy:GetDeploymentGroup",
"codedeploy:CreateDeployment",
"codedeploy:GetDeployment",
"codedeploy:GetDeploymentConfig",
"codedeploy:RegisterApplicationRevision"]

resources = [
aws_ecs_service.frontend.id,
aws_codedeploy_deployment_group.frontend.arn,
"arn:aws:codedeploy:${var.region}:${var.aws_account_id}:deploymentconfig:*}",
aws_codedeploy_app.frontend.arn
]
}


}
resource "aws_iam_role_policy" "codedeploy" {
role = aws_iam_role.codedeploy.name
policy = data.aws_iam_policy_document.codedeploy.json
}

CodeDeploy Deployment Process

To deploy, the following steps are to be taken:

  • Build the docker image
  • Push the docker image to AWS ECR
  • Get the task definition using AWS CLI
aws ecs describe-task-definition — task-definition “$ecs_task_def_name” — query taskDefinition > task-definition.json
  • Update the code deploy appspec.yaml file

sample appspec file:

applicationName: 'code-deploy-app'
deploymentGroupName: 'code-deploy-deployment-group'
revision:
revisionType: AppSpecContent
appSpecContent:
content: |
version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "[YOUR_TASK_DEFINITION_ARN]"
LoadBalancerInfo:
ContainerName: "ecs-service-container"
ContainerPort: 80
  • Deploy code using code deploy
aws deploy create-deployment --cli-input-yaml file://appspec.yaml

Cons of Blue/Green Deployment

One of the main cons of using Blue/Green Deployment is that it is expensive. Since you need a replica of the running environment, it means that you will require double the resources. For an expensive environment, this deployment strategy may not be the best.

--

--

Wambui Gitau

Data scientist transitioning to Data engineer. Will mostly be talking about Cloud Data Engineering concepts