Using terraform to move towards a blue-green deployment strategy

Alok Singh
Wobe Engineering Blog
4 min readJan 23, 2017
Expedia case study for blue-green deployment Image credit: Amazon

When we first started work on terraform, the first implementation was to encode the whole setup in terraform. To be clear, terraform works best if you use for infrastructure setup and not for configuration management. What would formerly be a beast of a boto script that brings up infrastructure in a form that your configuration management system can work with, you will now have a declarative DSL with tooling support.

What we would like is to use terraform to instantiate a parallel environment that can be tied to any existing service in any arbitrary way. This will be the building block for future automation. Note that your application should be able to support this mode. Application state and how it is persisted usually determines the mode of operation. Application state also determines the scaling strategies that you can adopt and for this reason alone is something that you want to think very carefully about and bring in an “operations” perspective early in the specification cycle.

We have different types of services, each with their own users and while they are all related, it does not make sense to instantiate whole new environments just to push out a change in a couple of services. To illustrate, why bring up a new metabase setup when there is no need for it?

The central insight was that we could model our services in terms of tightly coupled components for one particular use as a block with a well-defined entry-point. The reasoning is similar to that of Kubernetes’s services.

Thus, in our case, containers running our user backend became the user block. Containers running metabase app became the analytics block. Sentry and drone became our infra block. The database tier similarly became the DB block. With this setup, it would be possible for us to link any block to any other using VPC peering. Pushing a change for users with no DB impact? Spin up a new user block, test it, and when you are ready to change, change the ALB to point to the new block. Rollback is just changing the ALB back to the old block.

With this idea in mind, a small PoC was drawn up to flesh out the idea and provide enough information to draw up a spec.The proof-of-concept involves just two blocks (blk1 and blk2) and a Makefile to drive terraform. The implementation hinges on:

  • keeping a separate terraform state file for each run. A run is block that is running.
  • keeping VPC peering outside terraform. We lookup the required information from the state file, using output arguments so terraform remains the single source of truth.

Before we play out the scenario assume that the state is that have run called green for both blocks.

Before we start, colours correspond to runs

We want to implement a change in blk1. You would:

  1. make create NAME=red BLOCK=blk1
  2. ./peering.zsh create red blk1 green blk2

Here NAME is what you would like to name this run. The terraform state file would be named <NAME>-<BLOCK>.tfstate. This allows you to keep multiple parallel runs of any block, each one named differently. Now you have a whole new instance of blk1 which can communicate with blk2 via VPC peering.

Scenario at this point, colours correspond to runs

Assuming that you have tied together your dependency injection with terraform (or by using terraform provisioners) you should be good to go. The PoC wisely sidesteps this quagmire. When you are done testing and are ready to move to using the new red instance, you can delete the old red instance as:

  1. ./peering.zsh delete green blk1 green blk2
  2. make destroy RUN=green BLOCK=blk1
After deleting the old run, colours correspond to runs

You can checkout the self-contained PoC code at https://github.com/alephnull/tf-blocks. You probably want to look at peering.zsh to put in your AWS CLI profile. The terraform bits assume that you have the required profile setup in your environment.

Things that will be a part of the final implementation are:

  • state file management. Even a simple push to S3 bucket might serve our immediate purposes
  • ALB management
  • re-implement peering.zsh in a more sane language
  • first steps towards configuration management and dependency injection.

--

--