Blue/Green deployment on AWS EC2 instances with Systems Manager Automation runbook

6 min readDec 28, 2023

Introduction:

Blue/Green deployment strategy in this post is based on this AWS post about fine-tuning Blue/Green deployments on Application Load Balancer. But why is this solution instead of using CodeDeploy?

In our case we need to have immediate swap traffic from the “Green” environment to “Blue”, CodeDeploy, for now, doesn’t have this option for the EC2/On-premises compute platform, even with the option “Reroute traffic immediately” we have a response in the same time from “Green” and “Blue” EC2 instances for some time during deployment processes.

Another issue is that CodeDeploy deployment on the EC2/On-premises compute platform is based on the instances tags, after the deployment process we need to have a solution for managing tags on the “Blue” instance to be ready for the next deployment.

Also, there is an issue with the creation necessary configuration of the CodeDeploy deployment group with CloudFormation (CloudFormation is used for the creation of infrastructure in AWS).

The Systems Manager Automation runbook can be more flexible in deployment configuration related to our requirements.

This post is the first part of my series of posts about Blue/Green deployment on AWS EC2 instances with the Systems Manager Automation runbook, the second part is here and the third part is here.

About the project:

All infrastructure in AWS (except S3 bucket, Route 53 domain, and SSL/TLS certificate) for this project is created with CloudFormation and can be found in this repository. S3 bucket is necessary for storing nested stack templates and application files. Route 53 domain and SSL/TLS certificate are necessary for secure WEB access to the EC2 instances through the Application Load Balancer. Amazon SNS topic is necessary for sending notifications about the status of the deployment.

The main steps of the Systems Manager Automation runbook:

a) checking if we have already any execution related to the runbook, if yes — skip the deployment;

b) creating a “Green” EC2 instance with all necessary configuration, waiting a few minutes to make sure everything is configured correctly;

c) making a reboot of the instance and checking the status of the instance after reboot;

d) making deployment of the “Green” EC2 instance: registering the instance to the necessary Target Group, making swap weights of the Target Groups in the listener rule configuration, terminating “Blue” EC2 instance;

e) sending notification about status of the deployment.

Blue/Green deployment step configuration from the Systems Manager runbook:

- name: BlueGreenDeployment
  action: aws:executeScript
  maxAttempts: 2
  timeoutSeconds: 300
  isCritical: true
  onFailure: step:SendNotification
  inputs:
    Runtime: python3.8
    Handler: BlueGreenDeployment
    InputPayload:
      Region: "{{ global:REGION }}"
      ListenerArn: "{{ ListenerArn }}"
      InstanceIds: "{{ LaunchInstance.CreatedInstanceId }}"
    Script: |
      import boto3
      import time
      from botocore.exceptions import ClientError

      def BlueGreenDeployment(event, context):
          try:
              elbv2 = boto3.client('elbv2', region_name=event['Region'])
              ec2_client = boto3.client('ec2', region_name=event['Region'])

              # Register instance with the target group
              chosen_target_group, target_group_arn_1, target_group_arn_2, instances_tg1, instances_tg2, weight_tg1, weight_tg2 = get_listener_details(elbv2, event['ListenerArn'])
              
              if chosen_target_group is None:
                  raise Exception("[ERROR] We don't have necessary target group.")
              
              elbv2.register_targets(
                  TargetGroupArn=chosen_target_group,
                  Targets=[
                      {'Id': instance_id}
                      for instance_id in event['InstanceIds']
                  ],
              )
              print(f"[INFO] Instance {event['InstanceIds']} registered with the target group {chosen_target_group}")
              
              # Wait for instances to be healthy
              wait_for_instances_to_be_healthy(elbv2, chosen_target_group, event['InstanceIds'], max_wait_time=300)

              if instances_tg1 or instances_tg2:
                  swap_weights(elbv2, event['ListenerArn'], target_group_arn_1, target_group_arn_2, weight_tg1, weight_tg2)

                  time.sleep(15)

                  # Terminate "Blue" instance
                  if instances_tg1:
                      for instance_id in instances_tg1:
                          if instance_id not in instances_tg2 and weight_tg1 == 100:
                              ec2_client.terminate_instances(InstanceIds=[instance_id])
                              print(f"Instance {instance_id}  terminated from Target Group 1")
                  if instances_tg2:
                      for instance_id in instances_tg2:
                          if instance_id not in instances_tg1 and weight_tg2 == 100:
                              ec2_client.terminate_instances(InstanceIds=[instance_id])
                              print(f"Instance {instance_id}  terminated from Target Group 2")
                  print("[INFO] Traffic swap and EC2 instance termination completed successfully.")
              else:
                  print("[INFO] Traffic swap and EC2 instance termination were skipped")
          except ClientError as e:
              raise Exception("[ERROR]", e)

      # get ARNs of target groups and weights of this groups in Listener rule
      def get_listener_details(elb_client, listener_arn):
          # Get the current state of the listener and its rules
          listener_description = elb_client.describe_rules(ListenerArn=listener_arn)

          # Get the Target Group ARNs and weights from the listener rule
          target_group_arn_1 = listener_description['Rules'][0]['Actions'][0]['ForwardConfig']['TargetGroups'][0]['TargetGroupArn']
          target_group_arn_2 = listener_description['Rules'][0]['Actions'][0]['ForwardConfig']['TargetGroups'][1]['TargetGroupArn']
          weight_tg1 = listener_description['Rules'][0]['Actions'][0]['ForwardConfig']['TargetGroups'][0]['Weight']
          weight_tg2 = listener_description['Rules'][0]['Actions'][0]['ForwardConfig']['TargetGroups'][1]['Weight']
        
          # Get instance IDs from target groups
          instances_tg1 = get_instance_ids(elb_client, target_group_arn_1)
          instances_tg2 = get_instance_ids(elb_client, target_group_arn_2)
          
          # Check if instances are registered to the target groups
          if not instances_tg1 and not instances_tg2:
              chosen_target_group = target_group_arn_2 if weight_tg2 == 100 else target_group_arn_1
          elif not instances_tg1:
              chosen_target_group = target_group_arn_1
          elif not instances_tg2:
              chosen_target_group = target_group_arn_2
          else:
              chosen_target_group = None
          return chosen_target_group, target_group_arn_1, target_group_arn_2, instances_tg1, instances_tg2, weight_tg1, weight_tg2

      # Take Instances IDs registered to the Target Groups
      def get_instance_ids(elb_client, target_group_arn):
        response = elb_client.describe_target_health(TargetGroupArn=target_group_arn)
        instance_ids = [target['Target']['Id'] for target in response['TargetHealthDescriptions'] if target['TargetHealth']['State'] == 'healthy']
        return instance_ids

      def wait_for_instances_to_be_healthy(elbv2_client, target_group_arn, instance_ids, max_wait_time=300, polling_interval=10):
          start_time = time.time()

          while time.time() - start_time < max_wait_time:
              # Describe the health of the instances in the target group
              health_response = elbv2_client.describe_target_health(TargetGroupArn=target_group_arn)

              # Check if all specified instances are healthy
              healthy_instance_ids = {health['Target']['Id'] for health in health_response['TargetHealthDescriptions'] if health['TargetHealth']['State'] == 'healthy'}
              if set(instance_ids) == healthy_instance_ids:
                  print(f"[INFO] All instances {instance_ids} are healthy in {target_group_arn}")
                  return

              # Wait before the next check
              time.sleep(polling_interval)

          # If the loop exits, raise an exception indicating that instances are not healthy
          raise Exception(f"[ERROR] Instances {instance_ids} in target group {target_group_arn} did not become healthy within the specified time.")

      def swap_weights(elb_client, listener_arn, target_group_arn_1, target_group_arn_2, weight_tg1, weight_tg2):
          # Swap the weights for the listener rule
          elb_client.modify_listener(
              ListenerArn=listener_arn,
              DefaultActions=[
                  {
                      "Type": "forward",
                      "ForwardConfig": {
                          "TargetGroups": [
                              {
                                  "TargetGroupArn": target_group_arn_1,
                                  "Weight": weight_tg2
                              },
                              {
                                  "TargetGroupArn": target_group_arn_2,
                                  "Weight": weight_tg1
                              }
                          ]
                      }
                  }
              ]
          )
          print("Weight swap completed successfully")
  nextStep: SendNotification

Deployment and infrastructure schemas:

Deployment schema of the Systems Manager runbook

Prerequisites:

Before you start, make sure the following requirements are met:
- An AWS account with permissions to create resources.
- AWS CLI installed on your local machine.

Deployment:

Clone the repository.

git clone https://gitlab.com/Andr1500/ssm_runbook_bluegreen.git

2. Create an S3 bucket with a unique name for nested stack templates and application files. Here are requirements to the bucket naming rules.

on Linux:
date=$(date +%Y%m%d%H%M%S)

on Windows PowerShell:
$date = Get-Date -Format "yyyyMMddHHmmss"

aws s3api create-bucket --bucket cloudformation-app-files-${date} --region YOUR_REGION \
 --create-bucket-configuration LocationConstraint=YOUR_REGION

3. Add policy to the S3 bucket for access from the EC2 instance.

aws s3api put-bucket-policy --bucket cloudformation-app-files-${date} \
--policy '{"Version":"2012–10–17","Statement":[{"Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"s3:GetObject","Resource":"arn:aws:s3:::'"cloudformation-app-files-${date}"'/*"}]}'

4. Fill in all necessary Parameters in the infrastructure/root.yaml file and send all nested stack files to the S3 bucket.

aws s3 cp infrastructure s3://cloudformation-app-files-${date}/infrastructure  --recursive

5. Go to the infrastructure directory and create CloudFormation stack.

aws cloudformation create-stack \
    --stack-name ec2-bluegreen-deployment \
    --template-body file://root.yaml \
    --capabilities CAPABILITY_NAMED_IAM \
    --parameters ParameterKey=UserData,ParameterValue="$(base64 -i user_data.txt)" \
    --disable-rollback

6. Open your mailbox and confirm your subscription to the SNS topic. Access to the deployed EC2 instance is possible through the Systems Manager. Go to AWS console -> AWS Systems Manager -> Fleet Manager -> choose the created EC2 instance -> Node actions -> Connect -> Start terminal session. Here you can check if everything was created and configured correctly during the deployment process.

7. For manual deployment. Send all application files to the S3 bucket.

aws s3 cp application s3://cloudformation-app-files-${date}/application  --recursive

Start Systems Manager Automation runbook execution, if everything is ok — you receive an email with the information “Deployment Status: Success” and will have WEB access to the deployed EC2 instance through the WEB browser, in case of any failure — you receive an email with information “Deployment Status: Failed” and details about the failed step. For deployment of a new version of the application — make changes in infrastructure/index.html, send changes to the S3 bucket, and start the Systems Manager Automation runbook again.

aws ssm start-automation-execution --document-name "Ec2BlueGreenDeployment"

8. Deployment test. In tests/test_deployment.sh you can find a simple script for making a test of availability and response from “Green” and “Blue” EC2 instances.

9. Deletion all files from the S3 bucket and deletion of the S3 bucket.

aws s3 rm s3://cloudformation-app-files-${date} --recursive
aws s3 rb s3://cloudformation-app-files-${date} --force

Conclusion:

In this post, we showed how to perform Blue/Green deployment with Application Load Balancer’s weighted target group feature realised with Systems Manager Automation runbook.

If you found this post helpful and interesting, please click the clap button below to show your support.

Blue/Green deployment on AWS EC2 instances with Systems Manager Automation runbook

Written by Andrii Shykhov