Upgrades Without Tears Part 1 — Introduction to Blue/Green Deployment on AWS

If you’ve ever been involved in a production system upgrade that went sideways, then you know that there are better things to be doing on a Saturday night, like going out and having a good time or staying in and writing a great new feature for your startup.

Blue/green deployment is a release technique designed to take the difficulty out of production system upgrades. AWS provides a great environment to manage such deployments in a cost-effective and low-risk way.

This blog post is part 1 of a two-part series that describes how to take advantage of blue/green deployment strategies by using AWS to minimize deployment risk and cost, and also speed up your deployment cycle.

Upgrading Live Systems is a Challenge

In a traditional environment, the servers running v1 of your software are often the same servers that will shortly be running v2 of your software. You’ll have to upgrade them in place, and if the upgrade fails, then you must revert the servers back to v1 of your software. This leads to substantial risk during an upgrade of your production environment. You have no real opportunity to validate v2 in a realistic, live deployment while continuing to run v1.

As a startup, fighting upgrade fires takes precious time, a resource you may not be able to spare. It’s a big distraction that steals time from the activities that create value for your startup, like building new features. In some situations, recovering from a failed upgrade may itself fail, in which case you have a real emergency on your hands that impacts users and tarnishes the public image you’re working hard to build.

Many of the limitations of traditional environments are absent in the cloud. You have the ability to quickly and affordably provision new servers, switch off (and stop paying for) existing servers, and recreate entire networking infrastructure through code and automation. These abilities are game-changers for startups when it comes to deploying software.

What is Blue/Green Deployment?

Blue/green deployment on the AWS platform provides a safer, less stressful way to upgrade production software. Rather than try to define it, I’ll give you an example of how it works in practice. Let’s label our live v1 production environment “blue.” Now, we’ll stand up a second environment labeled “green” that is running v2 of our software. Once we validate the green environment, we can quickly switch traffic from the blue to the green environment. At this point, we can choose to retain or dispose of the blue environment.

The following illustration shows the basic anatomy of a blue/green deployment architecture. We have the two environments, with the blue one currently live in production. Our goal is to transition users over to the green one in a gradual controlled fashion.

Once you bring the green environment up, you can validate the new software before going live. Then, you start shifting traffic away from the blue environment and send it to the green one. Normally, you do that using weighted DNS resolution because it gives you an easy way to push more traffic to the green environment or revert traffic back to the blue environment in case of issues.

Benefits of Blue/Green Deployment

If the validation tests of the green environment fail before you start sending real traffic, then you can dispose of the environment without ever having affected the live blue production environment. AWS on-demand resources allow you to stop paying for the failed green environment resources and simply release those resources.

If you start switching a small proportion of traffic from blue to green and encounter serious issues, then you can quickly switch back to blue and restore service within minutes. This process is sometimes called canary analysis: you test that the new deployment works in a real-world scenario with a small set of your users.

The speed at which you can make these weighting changes at the DNS level depends on the time to live (TTL) of your Amazon Route 53 DNS records. The TTL determines how long users can cache DNS values before asking Amazon Route 53 for new ones. So you’ll see actual traffic transition gradually, as old cached values get updated with fresh values. Amazon Route 53 is designed to propagate DNS updates within 60 seconds.

You can easily take advantage of newer, more powerful, or cheaper servers by simply launching the newer server types in your green environment, validating them, and then cutting over.

You can also take advantage of Auto Scaling to optimize costs. You can allow the size of the fleet of servers in your green environment to grow gradually as you shift more traffic to it, while the size of the blue fleet will shrink as it handles less traffic.

Digging a Little Deeper into Blue/Green Deployment on AWS

Most modern application environments broadly follow an architecture pattern with three tiers:

  • Presentation layer (web-based, mobile app, desktop app, etc.)
  • API/business logic layer (RESTful APIs, SOAP, internal interfaces)
  • Data-persistence layer (SQL database, NoSQL datastore, files, etc.)

AWS has tools and services that make it easy to stand up and manage these multi-tier environments. This helps startup developers focus on what’s truly important, such as developing a great product rather than managing infrastructure.

Start with AWS Elastic Beanstalk, a platform as a service (PaaS) that supports most popular development platforms and languages. If your application doesn’t require a very customized platform environment to operate, this is a great, easy to use service. It also offers zero downtime deployments. Use our easy step-by-step guide to get started.

If your application has a more complex multi-tier architecture, AWS OpsWorks allows you to model complex decoupled applications. It trades in some of the ease of use of Elastic Beanstalk for increased flexibility. A great feature that makes blue/green deployment easy is the ability to clone a stack and create a copy of your blue stack as a baseline for your green one. Learn how to get started using OpsWorks.

Regardless of whether application layers are clearly separated and running on different servers, or just logically delineated inside your code, your new deployment can contain changes in any of them. Your data persistence layer, however, may need some extra consideration:

  • If the new version of your software doesn’t contain any data layer changes, then you stand up a new green environment with your stateless presentation and API layers, and simply reference the same data layer for both.
  • If you make modifications, such as schema changes, to your data layer, you’ll need a process in place to keep the data in sync between the blue and green environments. Your users will make changes to data in both environments during the deployment. The complexity of such a process depends on your data needs: how often the data changes, the consistency model, and how different the old versus new data models are.

Customers Using Blue/Green Deployment on AWS

To learn how customers have actively deployed complex systems on AWS using various techniques to perform blue/green deployment, see the following videos:

In part two of this series, we’ll walk through the blue/green deployment process step by step, and we’ll discuss data synchronization strategies in greater detail.