AMI rolling update using Ansible

Rachid Belaid
Open House
Published in
3 min readApr 4, 2016

This post covers how to achieve zero downtime updates of an AMI with an AWS Auto Scaling Group and using Ansible. At Opendoor, we use Convox, ECS and Docker for most of our backend services, but this solution isn’t a perfect fit for all our use cases.

Once we decided that we wanted to deploy via AMI, we couldn’t find a well-documented method for doing a rolling update of an AMI with Ansible that fitted our requirements:

  • Zero downtime update
  • Detecting an existing Launch Configuration
  • Detecting if the AMI needs to be updated
  • Deleting the old Launch Configuration
  • Creating or updating the Auto Scaling Group with the new launch configuration
  • Preserving the same number of instances running during the update

Why an AMI

Opendoor is built on top of AWS. Without entering into too much detail, we have a microservice which is powered by ~5GB of geodata. It’s no secret that data-intensive services are the trickiest thing to deploy in modern systems — shipping the data within a docker container was not a maintainable solution and neither was downloading it at runtime. The workload on this service is highly uneven and would need to scale quickly when large batch jobs are utilizing it. Our problem was a perfect fit for an AWS Auto Scaling Group and pre-built AMI.

Why Ansible

We were already using Ansible to provision our AWS infrastructure and configuring hosts. Ansible has a pretty good support for provisioning AWS primitives using the AWS APIs. We considered using Terraform but if you don’t already use Terraform to describe all your resources then it cannot discover existing resources.

Building an AMI

Luckily, tools like Packer make it easy to build an AMI from different kinds of provisioners like a shell script, Ansible. This was perfect for us because we were already using Ansible for other provisioning tasks. To build an AMI using Packer and Ansible, all you need to do is create a JSON configuration file for Packer like below:

Zero Downtime Update

When you want zero downtime you need to ensure that the traffic doesn’t get disturbed and the server capacity is preserved. There are different strategies to do zero downtime deployment and in some more complex scenarios, you’ll need to setup a duplicated stack before being able to switch the traffic in your load balancer. This strategy is usually referred to as Blue Green Deployment due to Martin Fowler’s article.

In our case, our application didn’t depend on a database or any data migration so we could adopt an incremental rolling update. AWS doesn’t handle the termination of existing instances when you decide to update the Launch Configuration of the Auto Scaling Group so you’ll need to do it yourself. You can see in the animated gif how we expect to update our currently running instances.

Ansible provides options to do rolling updates of AMI but it was lacking some features to inspect the currently running Launch Configuration or Auto Scaling group. It was necessary for us to delete the old launch configuration and keep the current number of instances running. We opted to call the AWS CLI within ansible instead. We wrote the following playbook for Ansible to do a rolling update of instances. It should be generic enough to help you get started.

In conclusion, if you have a use case which doesn’t depend on any data migration, then this method works pretty well. The downside is that based on the number of instances, the update can take some time due to having to replace every instance one by one and waiting for them to appear healthy in the ELB.

If you’re interested in working with our team on any of this, we’re currently hiring!

If you have any feedback or you find any issues in this post, let us know.

Originally published at labs.opendoor.com on April 4, 2016.

--

--