Multi-Region Deployments with Terraform

Chris Pisano
Oct 23, 2018 · 5 min read

Overview

In my previous post, Deploying Multiple Environments with Terraform, I described how I used Terraform to deploy to multiple environments within a single project. Since then, new requirements were assigned to my project and my team needed to implement multi-region deployments. Being multi-region is the same concept as having data centers in multiple parts of the country or world; you’re safeguarding yourself against disaster.

This didn’t come as a big surprise to the team since anyone who wants to build a resilient application and implement disaster recovery should always build in more than one region. Additionally, being multi-region had the added benefit of being able to do global load balancing, meaning if you are running in us-east-1 and us-west-2 then users will hit the instance of the application that is geographically closest to them.

With some design work we realized multi-region deployment could easily be implemented since we’d made the code extensible and kept the configuration separate. It turned out that we could even implement multi-region without introducing a breaking change!

Extending the Design

The first thing added was a new exposed input called region. Previously this input was not exposed to the end user and was hardcoded in the configuration. This input was going to specify which AWS region the Kubernetes cluster would be deployed to. Instead of abstracting away the actual region names and just accepting simple values such as east or west, we opted to have the literal region names be the values; us-east-1 or us-west-2. This was a design decision that took a fair amount of debate, but ultimately it came down to the users knowing exactly where the cluster was being built.

The larger change was to the variable configuration. Originally all of the variables were maps where each key was the environment name; dev, qa, etc. This worked well for single region deployment, but for multi-region many of the values were different from region to region. The easy fix was to extend the list of known workspaces, but that was not a scalable option. Instead, the decision was made to convert all of the maps that had different values based on region to nested maps. This meant that the region name was the top level and under that were the different environments with their respective values.

With the change to the variable structure, changes also had to be made to the inputs to the variables module, as well as any of the variable lookups in the code that referenced a nested map. The former was simple, add a region input to the module and pass it the region variable. The latter was more in depth though. Not only did it take some time to find all of the lookups in the code that needed to be modified, but figuring out how to do the lookup properly took some trial and error.

Keeping it Backwards Compatible

Previous versions of the code did not support multi-region deployments, they only knew about the default east region, and had lookups configured for a now outdated variables design. Everything here points to this being a breaking change in the code base. Luckily, since the configuration is completely separate form the code, and we always had the region variable configured, just not exposed, we were able to keep this change backwards compatible.

The implementation here set the region variable by default to us-east-1, keeping all previous versions of the code working. When we exposed the region variable to the user, that value could now be overwritten allowing for the newly available region-specific parameters to be found and used in the provisioning.

Changes in the Code

Even though this wasn’t so much a code change as a configuration change, I think it is still valuable to show some of the before and after examples. Below are some code snippets showing what some of the refactors looked like:

variable "worker_elb_sg_map" {
description = "A map from environment to a comma-delimited list of build worker ELB security groups"
type = "map"
default = {
dev = "sg-9f59278yreuhifbf,sg-be2t43erfce,sg-434fedf2b"
qa = "sg-e945ygrthdrg,sg-e55tgr54hd,sg-7d34trfwe7"
staging = "sg-255yg45hedr5,sg-6234tth6,sg-9834tfery4e5t"
training = "sg-255yerd6h,sg-625t5rqrgy5,sg-98gr54w5g"
prod = "sg-4c5y65re5,sg-3b35tg4wg,sg-3e3tgrtw4y6"
}
output "worker_elb_security_groups" {
value = ["${split(",", var.worker_elb_sg_map[var.environment])}"]
}
variable "worker_elb_sg_map" {
description = "A map from environment to a comma-delimited list of build worker ELB security groups"
type = "map"
default = {
us-east-1 = {
dev = "sg-9fhjsdf76ef,sg-bksdajfhiece,sg-4487heff0b"
qa = "sg-e29834hfisb99,sg-e398hfu95,sg-7d398hdsaf7"
staging = "sg-239uhibwf942,sg-983huh939,sg-99834hh94f"
training = "sg-250ba552,sg-62b39719,sg-9983h9384hf"
prod = "sg-4c98hf93uc,sg-3938fh3hb,sg-3e083eiuf"
}
us-west-2 = {
qa = "sg-6f390f15,sg-1b645761,sg-93484j80e9"
prod = "sg-0c9384hf973hf,sg-ad2983hudh37,sg-4283h93498j"
}
}
}
output "worker_elb_security_groups" {
value = ["${split(",", "${lookup(var.worker_elb_sg_map[var.region], var.environment)}")}"]
}
module "variables" {
source = "git::https://<github url>/<org name>/<repo name>//variables"
environment = "${local.environment}"
size = "${local.size}"
}
module "variables" {
source = "git::https://<github url>/<org name>/<repo name>//variables"
environment = "${local.environment}"
size = "${local.size}"
region = "${var.region}"
}

The Outcome

To manage expectations our first implementation of supporting multi-region deployments was meant to be active/passive; which means that only one region will be receiving traffic and the other is strictly a failover in case of disaster.

After implementing these changes and making some significant additions to the way our build pipeline functions and verifies our code base, we have been able to reliably deploy to both us-east-1 and us-west-2 for almost four months. Since the difference in deploying to different regions and different environments is nothing more than a few input variables, maintaining everything has taken little to no additional overhead.

I’m very happy with the way things have turned out for my team so far but we are not even close to finished. The next iterations will focus on automated failover and eventually going to an active/active implementation. I’m excited to see where we go next in our development journey.


DISCLOSURE STATEMENT: These opinions are those of the author. Unless noted otherwise in this post, Capital One is not affiliated with, nor is it endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are the ownership of their respective owners. This article is © 2018 Capital One.

Capital One Tech

The low down on our high tech from the engineering experts…

Chris Pisano

Written by

Highly opinionated systems engineer dabbling in networking and security writing code to make my life easier.

Capital One Tech

The low down on our high tech from the engineering experts at Capital One. Learn about the solutions, ideas and stories driving our tech transformation.

Chris Pisano

Written by

Highly opinionated systems engineer dabbling in networking and security writing code to make my life easier.

Capital One Tech

The low down on our high tech from the engineering experts at Capital One. Learn about the solutions, ideas and stories driving our tech transformation.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store