Immutable Deployment using AWS Application Load Balancing

Published in

Qbits

10 min readOct 16, 2018

About two years ago, Quorum engineers implemented immutable deployment as our primary software deployment strategy. We shared what we learned as well as the sample code we used in Immutable Deployment @ Quorum. At the time, using a combination of AWS Elastic Load Balancing (ELB) and AWS Auto Scaling Groups (ASG) was a natural choice for what we needed. Since then, a few things have changed:

We dramatically scaled our product suite and launched Quorum Local, Quorum Stakeholder, and Quorum EU. This, at least on the engineering side, means more AWS EC2 instances, a more advanced permission system, and complex routing rules.
AWS announced Application Load Balancing (ALB) in late 2016, but support for many features that we’d want (one of them is host-based routing, which we’ll explore later in this article) wasn’t announced until recently. To add a cherry on top, the ALB is the only load balancer type that natively supports HTTP/2.
Ansible 2.4 was released in September 2017. This particular release added support for Application Load Balancing via an Ansible module called elb_application_lb .

All of the above reasons made it a no-brainer to migrate our classic ELB to the ALB. Infrastructure migration aside, we want to retain our immutable deployment strategy. This article serves as an upgrade to our article from last year and aims to explain our migration process and what we learned while doing so.

Deployment with Elastic Load Balancing: A Review

To better understand how Immutable Deployment works with the ALB, let’s first recap how it works with the classic ELB:

We have a staging environment with all the latest changes that we want to deploy to production.
We’ll make an Amazon Machine Image (AMI) from the staging instance. This AMI is the source image that all production instances will be based off.
A Launch Config (LC), which is basically a blueprint for all production instances, is created. The LC specifies the AMI ID from Step 2, the security groups our production instances will be in, what the instance type will be, among other settings.
A new green Auto Scaling Group (ASG) is created from the LC and registered to the ELB. This ASG coexists simultaneously with the current blue ASG that’s actually serving production traffic.
Once instances in the green ASG are deemed healthy by the ELB (using ELB healthcheck rather than EC2), we deregister the blue ASG from the ELB. Traffic to the ELB is now routed to the green ASG, and this deployment strategy allows us to deploy anytime, often multiple times a day, with no interuption to any of our users.
A few clean-up tasks happen here. This includes tearing down the blue ASG (and all associated instances), cycling/renaming the AMIs, and making sure staging is reverted back to the settings right before deployment.

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html

Deployment with Application Load Balancing

The AWS ALB introduces the ability to route different requests to different targets, based on a set of listener rules. Our Auto Scaling Group is no longer registered directly to the load balancer, but rather to a Target Group, of which there can be many, that’s registered to the ALB. Attempting to compare and contrast the classic ELB and the ALB in this article would not do it justice; rather we encourage interested readers to check out AWS documentation or this excellent article by SumoLogic.

Given this change, there are now a few ways to carry out Immutable Deployment with the ALB: (please note that this list is not meant to be exhaustive; rather they are all the options that we explored prior to the final migration and this article.)

DNS Swap: We spin up a new service behind a different ALB, and deployment is carried out by swapping the Route53 alias records between the two Application Load Balancers. This technically was possible with Immutable Deployment using the classic ELB as well, but just like back then we decided to not go with this approach due to DNS complexities.
Target Group Swap: We spin up a new target group and attach it to the existing ALB via a new listener rule. We then launch a new ASG and attach it to the new target group. Once the ASG is deemed healthy by AWS, we swap the listener rules so that all production traffic will go to the new target group.
Auto Scaling Group Swap: We spin up a new ASG and attach it to the same target group that production traffic is going to. Once the new ASG is deemed healthy, we simply deregister the old ASG from the target group. This is undoubtedly the laziest method because there would be no change to most of the infrastructure (the ALB and the target group), but can cause some confusion if the old ASG is not deregistered fast enough because production traffic would be directed towards both the old and new code simultaneously.

Notice how fewer new infrastructures are being spun up (and subsequently more existing infrastructures are reused) as we go down the list of deployment methods. It ultimately comes down to how much reusability and granularity we want, and the second option (Target Group Swap) is a clear winner in our eyes. This option not only avoids DNS and Route 53 complexities (which on its own is already a black box), but also gives us enough control over the ALB listener rules so that we can spin up a new set of services on a different subdomain, perform automated and manual QA, and swap the listener rules when we’re ready to release the new code to our users, while keeping production intact.

Let’s now go into more details on each step of the Target Group Swap method.

(Same as the classic ELB) We have a staging environment with all the latest changes that we want to deploy to production.

Existing configuration with the blue target group. PC: Kevin King

2. (Same as the classic ELB) We’ll make an AMI from the staging instance. This AMI is the source image that all production instances will be based off.

3. We’ll create a new target group, which we’ll call the green target group.

4. We’ll register this green target group to the existing ALB via a new listener rule that utilizes host-based routing. Specifically, we’ll configure it so that traffic to new.quorum.us will be routed to the green target group, all the while traffic to production is unaffected.

Creating the green target group and attaching a listener to it. PC: Kevin King

5. (Same as the classic ELB) A Launch Config, which is basically a blueprint for all production instances, is created.

6. A new Auto Scaling Group is created from the LC. Instead of being registered directly to the classic ELB, this ASG is registered to the green target group. All other configuration settings for this ASG stay the same, including the ELB health check type. Once this ASG is deemed healthy by AWS, new.quorum.us should be able to receive traffic.

7. Optionally, we can now visit new.quorum.us and perform automated and manual QA. While this step should have been done well before we even started the deployment process (ideally on the staging environment that mimics production as much as possible), it never hurts to re-test anything. Additionally, certain testing methods, like load testing, are best performed on a parallel environment and infrastructure that mimic exactly production, which new.quorum.us is.

8. Once the QA process has been completed, we simply swap the listener rules so that all production traffic goes to the green target group.

Swapping the listener rules. PC: Kevin King

9. Finally, similar to the last step in classic ELB Immutable Deployment, we retire the blue ASG and the blue target group, which are both no longer in use.

Retiring the old target group. PC: Kevin King

The Ansible Playbook

Now that we have listed out all the steps, let’s translate them into working Ansible code. As previously mentioned, we first want to create a green target group with no registered instances. We’ll eventually register our ASG to this target group.

The above task creates a new target group with a desired name '{{ target_group_name }}' that we can pass in from the playbook or from the command line via -e or --extra-vars . We’ll then edit the listeners rules in the ALB to route traffic to this new target group without changing the current set of rules.

This particular task will be reused later to swap the listeners and direct all traffic to the new target group, so it’d make sense to abstract it out as an Ansible role. We’d then invoke this role in our playbook as follows.

This role basically attaches an additional condition to the current set of rules. If the URL (host-header) is new.quorum.us, we’ll forward the traffic to the green target group so that we can test that it’s functioning and AWS can confirm that it passes the ELB healthcheck rather than EC2, which is set when we launch the ASG. Otherwise, every request that goes to quorum.us will still be routed to the blue (current) target group, which means from a user’s perspective, nothing has changed.

We’re now ready to create a Launch Config and an Auto Scaling Group, the latter will be attached to the green target group rather than directly to the ELB in classic ELB deployment. The code will look exactly the same, except where we swap one line that attaches the ASG to the ELB load_balancers for another that attaches it to the target group via target_group_arns .

Once the ASG is created, a number of instances matching the desired_capacity will be spun up. They will need to pass the ELB healthcheck, which we have already set up by creating a listener rule for the green target group that the ASG points to. Depending on the instance’s size and how many instances we ask AWS for, this could take anywhere between 5–15 minutes. Once the ASG (and subsequently the green target group) is deemed healthy by AWS, we want to swap the listeners so that all traffic will be directed to the new set of instances.

The above code snippet removes the old listeners rules and replaces it with a rule that forwards all traffic to the green target group by default. Additionally, we use the AWS CLI to attach another listener to the ALB, which redirects all traffic over HTTP to HTTPS. While it would have been amazing to do so via Ansible along with all forward actions, at the time of this writing (using Ansible 2.6.3), redirect is not yet supported. It definitely will be in the near future, and we’d just swap out the lone task using the shell module in favor of elb_application_lb.

Final Steps

Assuming everything went well above (we’ll discuss later when it didn’t), the remaining steps are simple: we want to notify the team that new code just went live 🎉, and we want to tear down the old target group and ASG to save 💸. That is easily done via the same modules that we used to create these resources: ec2_asg, ec2_lc, and elb_target_group, only with the absent state rather than present.

Finally, what if right after the new listeners are created, despite our best QA efforts, we discovered an undesirable behavior on production and wanted to rollback to the previous state ASAP? Fortunately, we can manipulate the listeners similar to what we did throughout the deployment steps earlier: we’d create a listener over HTTPS for the old target group (assuming it’s still around) and make a HTTP=>HTTPS redirect rule. We also swap the new code (the green target group) to be behind the subdomain new.quorum.us, so that our team can still inspect the changes and figure out what went wrong while our users’ workflow are not interrupted.

Conclusion

When we started migrating from the classic ELB to the ALB in Summer 2018, there were not a lot of online resources, at least in the Ansible world, on Immutable Deployment using the ALB. It’s the main reason we set out to create this article, with the hope that others can place great trust in Ansible and its support for the ALB as well as other AWS resources, and start using Immutable Deployment in their projects.

Quorum has now been using this new system and generally employed Immutable Deployment in our day-to-day work with great success. We feel empowered to roll out new changes to our clients anytime during the day, often multiple times a day, without interrupting anyone’s workflow. The ALB migration opens up many new approaches to deploy our software, enables us to explore cutting-edge technology in the DevOps/SRE space, and allows us to continue offering our clients the best integrated public affairs software platform as Quorum’s user base accelerates and expands beyond Washington.

Interested in working at Quorum? We’re hiring!