Why does POP use Terraform?

Robert Rees
POP Developers
Published in
4 min readApr 12, 2018

Recently I gave a brief internal talk about Terraform and some of the questions from the team mirrored those that former colleagues had asked so I thought it might be useful to write up some of the reasons why we are currently using Terraform.

Why use any kind of infrastructure definition?

So originally POP created it’s infrastructure by hand. When something was needed someone went into the AWS Console, followed a tutorial and then shared the details of how to connect to the thing they had created.

Predictably that approach had problems and when we needed to launch a version of our Casting product in the US we couldn’t re-create our UK infrastructure effectively. Also each deployment environment in the UK was subtly different with no clear indication what was right except that the Production environment was being used by all our customers so presumably it was functional.

These are among the classic problems that infrastructure configuration resolves. Having a configuration that you apply gives you repeatable infrastructure that you can apply and re-apply in different environments.

As a bonus you also get a recoverable infrastructure. If someone maliciously or accidentally destroys a bit of infrastructure you can simply recreate it by applying the configuration again.

The final problem that infrastructure configuration resolves for you is discovery of what you actually have in your infrastructure. The process of bringing our infrastructure under configuration control revealed a lot of inconsistency in our infrastructure. Comparing different resources in a GUI is difficult, particularly if certain configuration options are buried in sub-pages. When all the information is in a text file and you can start to add tooling support to it your infrastructure suddenly becomes readable. When someone has a question about the infrastructure it is now possible to answer most queries just by reading the configuration in our Git repository.

It also turned out that the Production environment was also incorrect compared to some of our Staging environments. As the original environment it was the oldest and also the most feared so changes had not been applied consistently to it.

So for these reasons primarily it is good that we use some kind of tool to manage our infrastructure.

Why not CloudFormation?

So this isn’t an either-or decision, we actually do use a bit of CloudFormation. Primarily it is used to bootstrap the AWS configuration required to allow the base AWS Roles to be created in an Account so that we can then start to use Terraform to provide the rest of the definition of the Account.

We also use vendor-provided CloudFormation where it makes sense.

We don’t currently use any other Cloud providers than AWS so we could potentially do a lot of what we do with Terraform in the YAML version of Cloudformation.

There’s two reasons not to, first of all I agree with Tommy Hall that every configuration DSL ends up needed conditional and loop statements, Terraform isn’t perfect on this but it is better than the current state of CloudFormation.

The second is that Terraform offers greater opportunities for configuration abstraction and re-use through its module system that CloudFormation currently does.

Finally anecdotally Terraform seems more widely known and used, perhaps because it isn’t linked to a particular provider’s platform.

Is Terraform perfect?

No tool can solve every aspect of a non-trivial problem. Using Terraform with AWS means learning both the AWS API and configuration rules that underpin the operations that Terraform carries out and the tool itself. The higher level of abstraction seems to repay the investment in having to learn the tool.

AWS support requires features to not only be available in the general API but also to have been implemented in the provider library that translate the Terraform configuration into the commands that need to be carried out to implement it. This means that the features available can lag compared to both the AWS console and CloudFormation. So for example for a while it was not possible to have Terraform create the Postgres flavour of Aurora despite the API accepting because the parameter was not accepted as valid at the provider layer.

Finally the planning stage is not infallible and there are various situations that can occur when actually executing the configuration that are not raised at the planning. In terms of code review this creates a dilemma, do you review the validated and planned configuration changes or the ones that have been actually been run successfully?

In general we are doing reviews after running the configuration as it is reasonable easy to request changes and implement them as part of the review. For more difficult changes you might ask for a review of the planned configuration and then re-review after it has been run or maybe run the entire configuration in another environment (for example using a different region) to test the execution.

Did we consider anything else?

There aren’t a lot of tools in the declarative infrastructure space but we did look at a few of the configuration management tools, in particular Ansible which we are still using for some deployment actions.

We didn’t want to use an agent system in a cloud environment so Ansible passes that check. However Ansible seemed to have a bigger focus on EC2 provision and less on services such as ECS which is what we are deploying to. By comparison Terraform and (naturally) Cloud Formation felt a lot more mature.

However maybe we missed a good alternative, if so feel free to comment and let us know.

--

--