Things I wish I knew about Terraform before jumping into it

A few weeks ago I wrote about my journey into Terraform. You can read more about this quest here:

Now I’ll try to summarize some learnings I had during the process.


Foreword

It’s ironic that the Portuguese translation of Earth is “Terra”.

Before really started exploring this infrastructure-as-code world, I’ve heard a lot of blabbering about Terraform.

— Terraform is really easy, you just write some code and *bam!*, there is your infrastructure…
— Terraform uses declarative syntax, it can’t be that hard…
— Terraform is cloud-agnostic, write your infrastructure code once and run it on AWS, GoogleCloud, Azure, Bluemix or wherever…
— Terraform can prevent children from starving in sub-Sahan Africa…

Of course that’s all bullshit!

Don’t get me wrong, I still think Terraform is a fantastic tool once you get to know it in further details, but the learning curve can be very steep, specially if you don’t have a good understanding of how the underlying provider works.

Here are some things I wish I knew before diving into this quest.

Terraform is not mature yet

Yo, easy there… You ain’t grown up enough for this!

Terraform is still on very active development. You can see how serious I am about that by visiting its releases and issues page on Github. By the time I started writing this post, the latest “stable” version available was 0.9.11 and there were 742 issues open only in the main repo (there are also individual repos for each major cloud provider). I myself must have opened a handful of them through the last few weeks.

Because its major version is still 0 , it means that you can expect significant breaking changes until v1 is launched. In fact, that’s the most frustrating part when you are trying to get started with it is that if you find that cool article on the internet about how to do X with Terraform, there are high chances that it will be outdated, even if it is just a few months old.

Beware of this before you start putting all your infrastructure in it.

Terraform is not cloud-agnostic

In the end, we are still all a bunch of cloud-believers…

I don’t remember where exactly I heard or read this, but this is probably the most widespread fake news about Terraform.

While it does provide support for multiple providers — since full-featured ones, as AWS, to more specific services, such as Github — it’s not like you can describe your infrastructure with generic code that will be translated into provider-specific resources.

In fact, Terraform has resources that maps more or less 1-to-1 to the underlying provider resources, often keeping known jargons as well. For example, an AWS Classic Load Balancer is named aws_elb in Terraform, while the closer equivalent on Microsoft Azure is called azurerm_lb . As you might expect, the configuration parameters for each resource also change, so they are not interchangeable whatsoever.

Therefore, cloud platform migration will continue to suck for the time being, but Terraform can make this task a little (yeah, just a little) less brittle.

Terraform won’t hide the complexity of underlying providers

If you don’t understand how AWS works, Terraform will not make your life easier. Indeed, it might make it worse, because you’ll have to deal with both AWS and Terraform quirks.

AWS has regional and global services. For instance, EC2 is regional, which means that an auto-scaling group in North Virginia has nothing to do with one in São Paulo, they can have the same name (which are ASG identifiers), for example. IAM, in its turn, is global, which means that when you define a role, it can be used anywhere. Then there is S3. S3 is an hybrid: while it has regional scope, its namespace is global, which means you can’t have buckets with the same name, even across different regions.

Terraform ain’t gonna help you with that sort of things. When you run teraform plan, it will tell you it’s all ok. Then you will happily proceed to terraform apply and instead of running smoothly as you’d expect, it‘s going to blow up on your face and tell you how stupid you are.

“Avoid code duplication” is so old-fashioned

No, I’m not and advocate of the RCP design pattern (Reuse by Copy & Paste), but, as far as I know, it’s very hard to keep your code generic with Terraform.

For example, if you are nesting modules, to make a parameter of the innermost one externally configurable, you must lift it up to every module in the middle:

# A/main.tf
variable "b" {}
variable "c" {}
module A {
source="/path/to/module/A"
  b="${var.b}"
c="${var.c}"
}
# ...
# B/main.tf
variable "b" {}
variable "c" {}
module B {
source="/path/to/module/B"
  var_c="${var.c}"
}
# ...
# C/main.tf
variable "c" {}
some_provider "some_resource" C {
some_attribute="${var.c}"
}

You have to declare the variable c 3 times and b 2 times to make it work.

Sometimes you are better off shamelessly copying a resource or module and changing it a little bit then trying to extensively parametrize it in order to make it more generalized. You’ll thank me later 😉.

If yow know a better way of doing this, for the sake of God, Budah, Krishna, Goku, etc., please let me know.

Terraform is full of dirty hacks

HCL (HashiCorp Configuration Language) is de name of the description language used by Terraform (HashiCorp is the company behind it). It has some, let’s say, peculiar ways of solving certain problems.

Warming a pizza with Terraform feels like…

One example: suppose that you want to conditionally create a resource, like this:

variable "custom_sg" {
description = "Custom security groups for the instance"
default = ""
}
resource "aws_security_group" "default_sg" {
count = "${custom_sg == "" ? 1 : 0} # this is a bit odd, but ok
# More params bellow... they are not relevant
}
resource "aws_instace" "example" {
securit_groups = ["${custom_sg != "" ? custom_sg : aws_security_group.default_sg.id}"]
# More params bellow... they are not relevant
}

The code above should work, right? WRONG!

Whenever you use the count parameter in a resource, Terraform will assume it is list of resources, even if the only possible values are 0 and 1 . So the code above you crash right in front of your eyes (the good news is that it fails on plan stage).

Well, since your aws_security_group.default_sg is a list, you cannot access its params directly. Instead, you have to point to individual items on that list. Terraform let’s you do that using a syntax like resource.name.<n>.param , where n is a number.

So I can just declare it like this:
aws_security_group.default_sg.0.id

Nice try, but you can’t! Because the way HCL is implemented, it has to parse the whole template, even if the value is not used. So in the case that you set default_sg , there is no item in aws_security_group.default to be referenced by 0, so Terraform will once again beat you to the ground and then laugh on your crying-baby face.

The “official” workaround is this:

resource "aws_instace" "example" {
securit_groups = ["${custom_sg != "" ? custom_sg : join("", aws_security_group.default_sg.*.id)}"] # WTF dude?
  # More params bellow... they are not relevant
}

Seriously? Where the heck did this come from? That * means all elements in the list. When none is present, it returns an empty list. That join is just a regular List -> String function. When the list is empty, it returns an empty string. When it has one element, it returns the parameter I want.

There are lots of dirty little hacks like this. The place to find them is the issues page.

Don’t use nested/in-line resources

No, seriously… Don’t do that… this gave me some pain in the arse.

You you have no idea what I’m talking about, Terraform allows you to define some resources within its “parent” as well as a standalone resource with a reference to it.

The reason why I recommend doing so is because, at least for us, a common use case is the need to extend security groups, route tables and other resources that support in-line resources. And the only way we can do that from outside the module definition is with standalone resources pointing to the resource within the module.

I have defined a VPC module, which contains the basic definition for a VPC. This makes the creation of VPCs on multiple regions a piece of cake. However, our main infrastructure is located in N. Virginia, so its configurations should be a little bit different from the other ones (not different enough to justify code duplication, though).

Initially I had something like this in our VPC module definition:

// ...
resource "aws_route_table" "public" {
vpc_id = "${aws_vpc.cluster_vpc.id}"
  route {
destination_cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.default_ig.id}"
}
}
//...

Then I would create our VPCs by reusing the module:

module "us-east-1" {
source = "../../modules/vpc"
  aws_region = "us-east-1"
vpc_cidr_block = "10.0.0.0/16"
}
module "us-east-2" {
source = "../../modules/vpc"
  aws_region = "us-east-2"
vpc_cidr_block = "10.1.0.0/16"
}
// ...

However, for us-east-1, I needed to create a VPC peering between another VPC we have in this region. To make this work, beyond creating the peering connection, I need to modify the route table, adding an entry that will properly route the traffic.

// ...
resource "aws_route" "peer_vpc" {
route_table_id = "${module.us-east-1.public_route_table_id}"
destination_cidr_block = "172.16.0.0"
vpc_peering_connection_id = "pcx-xxxxxx"
}
// ...

This should work, right?

Please, don’t politicize this post!

The behavior I observed when mixing both styles was that if the standalone resources didn’t exist, they would be created. However, once created, if I ran terraform apply again, they would be deleted. If I tried one more time, they they would be created and so on…

The gotcha is that you can’t mix both. I had to dig through some Terraform issues on Github to learn that.

Fortunately, this is now clear in the documentation, as it’s stated for example on aws_route_tableresource:

NOTE on Route Tables and Routes: Terraform currently provides both a standalone Route resource and a Route Table resource with routes defined in-line. At this time you cannot use a Route Table with in-line routes in conjunction with any Route resources. Doing so will cause a conflict of rule settings and will overwrite rules.

The solution was to extract the in-line route within the module to its own resource:

// ...
resource "aws_route" "internet" {
route_table_id = "${aws_route_table.public.id}"

destination_cidr_block = "0.0.0.0/0"
gateway_id = "${aws_internet_gateway.default_ig.id}"
}
// ...

This way I can amend the route table outside the module without pulling my hair out.


Conclusion

It has been great to work with Terraform. Sometimes I think I’d feel better to burn alive, but still great.

From the time I started until now, I noticed improvements in the documentation, which might help to make the learning curve less steep.

Since I’m still learning, I probably got some things wrong. So there might be room to improve this article in the future as I discover new/better patterns.

Stay tunned for more…