Using Feature Toggles with Terraform

Photo by Juskteez Vu on Unsplash

This article will describe how you can use feature toggling for creating Terraform resources with a single feature flag.

For context I hadn’t realised I needed one until I blew away my dev Jenkins EC2 instance, luckily I had automated snapshots created by our nightly Lambda function. So the problem was how can I easily restore the server back to its former glory using snapshots and in an automated way that was repeatable using Terraform. I also wanted to build a more resilient Terraform script to ease restoration in the future by simply typing Terraform Apply.

After reading a great article building feature toggles into terraform by Chris Pisano I had the inspiration to us a feature flag that would determine if an EC2 instance is to be restored or a new instance created. So here are the steps I used to do this.

Restoring from snapshot

Before I can create a feature toggle I needed to code the restore and I broke it down into these steps:

  1. Find the ebs snapshots for restore.
  2. Create an AMI (a bit odd but you will see why).
  3. The restore the instance using the new AMI.
  4. Associate the new instance with the existing elastic ip.

Find the ebs snapshot

Using the data aws_ebs_snapshot data source you can find the snapshot you need to restore from based on certain filters such as name and volume size for example.

data "aws_ebs_snapshot" "root" {
most_recent = true
owners = ["self"]
filter {
name = "volume-size"
values = ["64"]
}
filter {
name = "tag:Name"
values = ["${var.snapshot_name}"]
}
}

In the example above I am filtering on the size of my volume (root) and the name of my snapshot which I have set as a variable var.snapshot_name as to not hardcode it in. This would be needed to be done for all snapshots required to be restored.

Create an AMI

Next, I need to create an AMI that I would use to launch the new restored server. It might seem an unnecessary step, however, if I just attached the snapshots under the aws_instance it would only present as additional volumes and not use the root snapshot to restore all of the previous config and state. By creating an AMI from the snapshot I can overcome this issue.

resource "aws_ami" "restore" {
count = "${var.restore}"
name = "from-${data.aws_ebs_snapshot.root.snapshot_id}"
virtualization_type = "hvm"
root_device_name = "/dev/xvda"
ebs_block_device {
device_name = "/dev/xvda"
snapshot_id = "${data.aws_ebs_snapshot.root.id}"
volume_size = "${data.aws_ebs_snapshot.root.volume_size}"
}
}

The above creates the AMI with the snapshot defined as root. You will notice the count parameter, I’ll get into why we are using that very soon.

The restored instance

Now to define the instance to be restored using the new restore AMI I just created, again you will notice the count parameter, I’ll get to that soon.

resource "aws_instance" "restore" {
count = "${var.restore}"
ami = "${aws_ami.restore.id}"
instance_type = "t2.medium"

Associate the elastic IP

So that we don’t overwrite the existing elastic ip we would use the aws_eip_association resource.

resource "aws_eip_association" "restore" {
count = "${var.restore}"
instance_id = "${aws_instance.restore.id}"
allocation_id = "${aws_eip.this.id}"
}

This would also need to be added to the original resource in the same fashion. For example:

resource "aws_eip_association" "this" {
count = "${1- var.restore}"
instance_id = "${aws_instance.this.id}"
allocation_id = "${aws_eip.this.id}"
}

The Terraform Count Parameter

As you would have spotted throughout the examples I had a count parameter set on each of the restore resources with a variable of var.restore. The count parameter tells Terraform how many of resources it should create with zero being an acceptable value. With zero as the value, no resources are created. Also taking advantage that Terraform converts 1 and 0 as to a boolean true or false we can use it as a feature of:

# This is just a pseudo code. It will not work in Terraform
if ${var.restore} {
resource "aws_ami" "restore",
resource "aws_instance" "restore",
resource "aws_eip_association" "restore"
}
else {
resource "aws_instance" "this",
}

This great but I need to create a restored resource and not create a brand new Jenkins server. To do this I needed to take advantage of the fact Terraform also does simple math in interpolations. By adding a -1 to the count it would do the -1 + 1 = 0 (false). For example:

variable "restore" {
description = "Used to restore the jenkins server"
value = true
}
variable "snapshot_name" {
description = "snapshot name"
}
resource "aws_instance" "restore" {
count = "${var.restore}"
......
resource "aws_instance" "this" {
count = "${1- var.restore}"
....

The above would create the restored resource but not the Jenkins instance.

Creating a Terraform toggle.

With my var.restore variable now controlling the creation of either my restored or brand new Jenkins server, I needed to be able to toggle this feature on or off when needed. To do this I used the *.auto.tfvars file (I named restore.auto.tfvars, but you can name it whatever you want). If a *.auto.tfvars is present in the current directory, it is picked up by Terraform and is read at runtime passing any variables and values required.

# restore.auto.tfvars
# restore key controls restore toggle
# Snapshot_name controls what snapshot to restore.
restore = false
snapshot_name = "<insert snapshot name here>"

My variables.tf now only passes on the variable to the restore.auto.tfvars which now controls the restore toggle.

variable "restore"{
description = “Used to restore the jenkins server"
}
variable "snapshot_name" {
description = “snapshot name
}

Related:

Thanks, Yevgeniy Brikman from Gruntwork for writing Terraform tips & tricks: loops, loops, if-statements, and gotchas and thanks to Chris Pisano for writing Building Feature Toggles into Terraform