Blueprinting Terraform

Osvaldo Toja
4 min readMar 22, 2023

--

There’s a difference between writing terraform code and using terraform code. The distinction is clear in everybody’s mind, but in practice, not so much.

When we’re building infrastructure, chances are we have different environments. As good practitioners of Infrastructure as Code, we create a git repository to contain all the code.

When searching online for Terraform repository best practices, we’re gonna get a lot of results. Like this, and this other one, for medium size infra, large ones.

They all have in common the way code is organized:

  • modules directory
  • environments directories

The modules directory contains shared code used by all environments. This is terraform code.

Environment directories contain which modules to include alongside configuration values for the modules. This is terraform code.

What’s the problem with this approach? Let’s analyze it from a different perspective.

We are familiar with the idea of decoupling configuration from data.

For example, when deploying a python app, let’s say Django, we package the application in a container. We start with a base image, we install required python packages, copy the code and we end up with a container image ready to be deployed (broadly speaking).

We use exactly the same image (app code) in all environments. For each environment however, we provide specific environment variables and/or files. No python code files are provided, only configuration data.

What’s the current approach for deploying infrastructure? We use containers as well. We start with a base image, we install tools, binaries like terraform, terragrunt, etc. And we have a repository containing the infrastructure code. As shown earlier, with folders for the environments, and sometimes a folder for the modules. We use our custom image to run terraform commands in the pipeline and we have a full IaC pipeline.

Notice a difference with the django app? The terraform project keeps code and configuration together. That force us to mix terraform code development with infrastructure deployments in the same repository. Sure, we can do infra development pointing to a dev account, but we’re working on the same repository where staging and prod accounts are. There’s a risk.

For infrastructure, we’re mixing deployments with development. What if, the same way we write and deploy a Django application, we do it for infrastructure?

Packaging all the terraform stuff we need inside a container. Then, use the same container to deploy infrastructure to all environments.

The solution is to decouple infrastructure into two repositories. One repository with the terraform code for everything we need to deploy in an environment: shared modules and a root module with all the options we support in our infrastructure. The other repository with the environments directories, each one with configuration on what we’re going to deploy and the specifications for it.

The first repository will be a blueprint of an environment. The second one, will leverage Terraform’s count argument to decide what to deploy, including variables as needed.

What are the advantages of this approach?

Separation of concerns

The team building infrastructure can focus on developing infrastructure code while clients and/or engineering teams can own environments, aka the configuration data.

Improved developer productivity

We want to use tests to make sure we’re not introducing any bugs with our code. Having a separate repository allow us isolate infrastructure development into it’s own SDLC.

Infra engineers can do development against personal testing accounts. We can test the code in a pipeline using a team shared test account.

We can perform time consuming tests in the pipeline. Anything we need to make sure the code is ready to be used. This brings another benefit: reducing the time required to rollout changes to environments. No more terraform code tests time will be used during deployments, only during development, which makes sense.

Once we have good working code, we can create container images ready to be used. Containers we can point to any environments with a given set of configuration data and be confident on what the result will be.

Environment levels

Remember the first paragraph mentioning the chances of having more than one environment? Let’s say we only have one environment where we run our applications. That’ll be production. We have one environment, true, for developers. Infra engineers still need an environment where to do their development of the code for the infrastructure used in that one environment: production. Now let’s say we have three environments: dev, staging and prod. Developers have three environments, from the infrastructure point of view, all of those are one environment: production. If any of those environments fail, it will directly impact developer’s productivity. So while production is the most critical because end users and/or clients are affected, dev and staging are equally important for the infra team because it impacts internal users.

In a dev environment, developers can break things, it’s ok. Infra devs need such an environment as well.

Where dev, staging, QA and prod can be considered environment names, we can define a new variable: `environment_level`, relevant to infrastructure engineers.

Any environment with an impact on the company if it fails should be considered production level. An environment used for developing infrastructure will have dev as its environment level value.

Better DevEx

Environments users (not infra devs) care about which services to deploy and their configuration. Decoupling this data from terraform codebase opens the opportunity for more commonly used formats like yaml. Infra developers can easily leverage terragrunt to read the yaml file as an input to the terraform code. It is also possible to write a json schema file with a double purpose: to self-document available options when editing the yaml file and to validate the file to ensure proper syntax and values. Ansible users can use the template module to create `.tfvars` files as required.

Conclusion

This is not at attempt to revisit Terraform but to provide an alternative view. There’s no one size fits all. Sometimes the problem is not with the tool itself but with how we use the tool. And not everything is a nail :)

--

--