Blue Green and Canary Infrastructure with Terraform
This concept is so popular nowadays. Terraform enables you to create infrastructure with code and codes can be version controlled. In this article, instead of talking about what are the advantages of Terradorm or how to install it (you can check it from https://www.terraform.io/), we will make a real-world example on AWS.
Prerequisites
I will be using Amazon Web Services (AWS) for this tutorial, but the code or implementation won’t vary too much with another provider like Google Cloud Platform, Azure etc
You need to have some basic knowledge of Terraform. Also, you will need to have:
- A working AWS account. You can signup and use the free tier offered by AWS
- AWS CLI installed on your local machine
- Terraform installed on your local machine
- AWS IAM User with proper permissions and configure your AWS profile
What is Blue-Green and Canary deployments?
Blue-Green deployment is a DevOps practice that aims to reduce downtime on updates by creating a new copy of the desired component while maintaining the current.
Canary deployment is a deployment that releases an application or a service incrementally to a subset of users. It’s the lowest risk-prone compared to all other deployment strategies because of this control.
Given that, you end with two versions of the system: One with the actual version (blue) and another with a new one (green). When the new version is up and running, you can seamlessly switch traffic to it. This is useful not only to reduce downtime but also to improve rollback time when something bad happens.
Blue/Green Infrastructure
While Blue/Green deployment is a technique more commonly used with application deployment, the reduced costs of the cloud, in addition to the tools we have right now, make it possible to have two copies of an entire cloud infrastructure with little to no pain.
It’s also important to note that doing Blue/Green deployment of the entire Cloud Infrastructure is not a silver bullet and certainly a bit too much if you are doing small changes, for example, adding a new or changing EC2 instance type in your stack. If you plan to do major or breaking changes is a win and this deployment type is worth it.
After finishing this, you will be able to create an infrastructure containing:
- A Virtual Private Cloud
- Three Subnets, each one in a different Availability Zone
- An Internet Gateway
- A custom Route Table
- Associate Custom Route Table and Subnet
- A Security Group
- EC2 instances serving an Apache HTTPd server on the Port 80
- A Load Balancer pointing to those instances
- Optional: Attach A DNS Record to the Load Balancer
Most of these components will be the same for both the Blue/Green environments with minor differences for each environment.
The full example can be seen here.
Step 1: Setup Provider, Backend with Terraform
We start by creating a folder and opening the folder in your favourite editor of choice. I will be using vs code editor.
The first thing we must create is providers configuration. Terraform relies on plugins called “providers” to interact with cloud providers, SaaS providers, and other APIs. Terraform configurations must declare which providers they require so that Terraform can install and use them
Terraform stores the state of the infrastructure in a JSON file. It’s recommended to store that file on an external backend like Amazon S3. As we are using AWS for this tutorial, we will stick to S3, but Terraform supports the equivalent in each provider.
First, you need to create the S3 bucket in which the state will reside. You can do this either via the AWS S3 console or by doing:
aws s3api create-bucket --create-bucket-configuration LocationConstraint=eu-west-1 --bucket blue-green-for-learning-here --region us-east-1 | jq
We have referenced some variables e.g var.aws_profile
here, so we need to define them. We create a file variables.tf:
Note: Configure AWS profile. Here the profile is personal-deployment
you can change this to suit yours.
Step 2: Create VPC
Create a file namedvpc.tf
with this content:
cidr_block: 10.0.0.0/16
allows us to use IP Addresses that start with 10.0.X.X
This will give us 65,536 IP Addresses ready to use.
You can check for the remaining arguments here.
Step 3: Create Public Subnets
To do anything useful we will first need subnets. We will create 3 of them, each in a different availability zone. Create file named subnets.tf
with this content:
Here we create three subnets specifying:
- count: The number of subnets we want to create
- availability_zone: In this case, we are using the element() function which takes a list and an index and returns the element, even if the index is greater than the number of elements. This is useful to assign a different availability zone to each subnet.
- vpc_id: The VPC ID of the vpc we just created.
- cidr_block: We interpolated the previously defined
infrastructure_version
variable into the CIDR block. This will help in the future when creating the second version (green).
You can check for the remaining arguments here.
Step 4: Create an Internet Gateway
To enable our vpc to connect to the internet, we must have an internet gateway.
Create a file named internet_gw.tf
with this content:
You can check for the arguments here.
Step 5: Create a custom route table
Create a custom route table for the public subnets. Public subnets can reach the internet by using this.
Create a file named route_table.tf
with this content:
Step 6: Associate custom route table and subnet
In the same file route_table.tf
Step 7: Create a Security Group
We add security group inbound and outbound rules for our EC2 instances. We need to open the ports we need for our instances.
You can check for the remaining arguments here.
Step 8: AMI data source
We will be creating our blue environment soon and we will be adding an EC2 instance. We will be using Amazon Linux 2 AMI. This step is not necessary for this demo but I want to show you how you can get the latest AMI ID for Amazon Linux 2 OS:
Create a file name ami_datasource.tf
with this content:
Step 9: Create Blue Environment
First, let’s add the local values we will need for our environment.
Create a file named locals.tf
with this content:
Here we create subnets local values. For now, only focus on the subnets
Let’s add a user data script. Create a file named apache-script.sh
with this content:
Now, create a file named blue.tf
with this content:
Let’s explain a little bit about this file.
We have created a resource of type aws_instance, with these parameters:
- count: The number of resources of this type. In this case, it’s based on the
enable_blue_env
andblue_instance_count
variables that have defaultstrue
and2
respectively. - ami: The Amazon Machine Image for the instance. In this case, as we mentioned in step 8, we use
data.aws_ami.amzlinux.id
to get the latest Amazon Linux OS image ID. - instance_type: The type of the instance with a default of
t2.micro
- user_data: This allows us to assign an initialization script to the instance. In our case, we are running an Apache Httpd server with a custom webpage. There are better ways to define user data scripts, but we’ll keep it simple for now.
Don’t bother about lines 16–35, we will come to understand them in our next step.
You can find the remaining arguments here.
Let us update our variable.tf
to this:
Step 10: Create a Load Balancer
Create a file named load_balancer.tf
with this content:
In this file we have created a Load Balancer with:
- name: Self-explanatory
- subnets: The subnets the load balancer is available in
- security_groups: We have added the previously created security groups to be able to access it
- load_balancer_type: Here we are using application load balancer type
- listener: We have added a
aws_lb_listener
resource type that listens on port 80 of the load balancer. The listener uses a default action withaws_lb_target_group
we created in the previous step 9 lines 16–35. We also definehealth_check
with a simple HTTP healthcheck that targets port 80 of the instance.
We have created all the components we need for our blue environment. Let’s add one more file for the outputs.
Create a file named outputs.tf
with this content:
Here, we have added an output that displays the Load Balancer Public DNS. All Good now 😁. Let’s now start playing with terraform commands.
The first command we need to run is terraform fmt
to format our configuration files.
The next command we need to do is run terraform init
to initialize our working directory, load remote state and download all the modules we need
Then we run terraform validate
to validate our configuration files. Validate runs checks that verify whether a configuration is syntactically valid and internally consistent.
The next command we need to run is terraform plan
to create an execution plan with a preview of changes we will make in our infrastructure.
As you can see in the image, this plan will add 19 resources.
The next command we need to run is terraform apply
to execute the actions proposed in terraform plan.
This is where we accept the changes and apply them against real infrastructure.
Apply complete! Resources: 19 added, 0 changed, 0 destroyed.
And you can access the webpage using the lb_dns_name
output 😁.
We can do a curl for loop for this lb_dns_name:
Hurray!!! We are done with the Blue Environment configuration.
Let’s now create a Green Environment.
Step 11: Create Green Environment
Our Green environment will just be the same as the Blue environment with only one difference it will have a new version of deployment.
Create a file name green.tf
with this content:
Let’s update the Load Balancerload_balancer.tf
with this new content:
The update here is to include both the two target groups, blue-green-deployment-blue
and blue-green-deployment-green
,stickiness
and traffic distribution for the two environments with defaults100
to Blue
Environment and 0
to Green
Environment.
Let’s also create a traffic distribution map local values in locals.tf
file.
Update locals.tf
file with this content:
You can read more about AWS ALB traffic distribution here.
Let’s also update the variables.tf
to this content:
Let’s run terraform apply
to create a green environment and enable 90%
traffic to the blue environment:
terraform apply -var traffic_distribution=blue-90 -var enable_green_env=true -auto-approve
Now if we do a curl for loop for this lb_dns_name
you can see most of the traffic 90%
is going to the blue deployment with a few 10%
to the green deployment.
Let’s run terraform apply
to split traffic between the two deployments 50–50
:
terraform apply -var traffic_distribution=split -var enable_green_env=true -auto-approve
You can see the traffic is evenly distributed between the two deployments.
Finally, let’s completely switch green deployment and disable blue deployment.
terraform apply -var traffic_distribution=green -var enable_green_env=true -var enable_blue_env=false -auto-approve
Now if we do a curl for loop for this lb_dns_name
we can see we are only getting responses from the green new green deployment.
This is 🤩 . Congratulations 👏 .
Optional: Attach A DNS Record to the Load Balancer
I’m not going to cover much on this case, but what I’ve ended up doing in production is creating a DNS Record that points to a specific version of the Load Balancer. An example of this terraform could be:
The do terraform apply
:
terraform apply -var traffic_distribution=green -var enable_green_env=true -var enable_blue_env=false -var domain_name=yourdomain.com -auto-approve
Please change yourdomain.com
var to your domain.
Please note this configuration for DNS only works when you have your domain hosted in AWS Route53
If you are doing this just for fun, please do terraform destroy
to delete the new green deployment:
terraform destroy -var traffic_distribution=green -var enable_green_env=true -var enable_blue_env=false -auto-approve
If you added domain configuration then run:
terraform destroy -var traffic_distribution=green -var enable_green_env=true -var enable_blue_env=false -var domain_name=yourdomain.com -auto-approve
Destroy complete! Resources: 21 destroyed 😁.
That’s all for now. Thanks for reading. Let’s connect on Twitter and LinkedIn 😁.