The secret to Terraformʼs efficiency

9 min readAug 21, 2024

*This article is for Terraform heavy users, who manage complex infrastructures.

“Let me tell you why you’re here. You’re here because you know something. What you know, you can’t explain, but you feel it. You’ve felt it your entire life, that there’s something wrong with the world. You don’t know what it is, but it’s there, like a splinter in your mind, driving you mad.” Morpheus, The Matrix

So what’s wrong with it?
Lets say you follow these “Terraform best practices” and have a medium size infra:
3 environments (prod,stg,dev), 3 regions in prod = 5 terraform directories duplicating each other! Did you feel, that something is wrong here? You are not alone. What if I tell you this could be 1 single terraform directory?

Terraform users consume 🔵 every day

Do you know this taste?

terraform-directory per state OR root-module per state

Where does it come from?

Bad advices in 3rd party “best practices”

Majority can be wrong. This happens because it is easier to build opinion based on 3 tutorials instead of reading documentation and try making something best out of it.

Here is the list of resources which teach to create a root module per environment (this is appalling) :

🔴

Surprising sooth — 1 Terraform root directory to rule them all.

Here is the taste of the red pill: Terraform is not a rocket science. The simpler you structure it — the better. And there is a hidden mechanism to do it.

The “one-root module” Terraform structure described below is the one implied by Terraform creators because it emulates experience you would have in Terraform Cloud while using free remote states.
Reading a complete documentation bears fruits:

Backend Configuration - Configuration Language | Terraform | HashiCorp Developer

Use the `backend` block to control where Terraform stores state. Learn about the available state backends, the backend…

developer.hashicorp.com

Root module

Prod/non-prod environments must be identical — the only differences are variables related to cost optimization (instance sizes etc) and environment name itself. This is the purpose of non-prod environments — to mimic production.

To truly achieve identical environments, you should have only one root module for all environments/regions. How is that possible when you are not using Terraform Cloud? Here is the secret:

$ tree --dirsfirst
├── assets/
├── envs/
│   ├── dev/
│   │   └── us-east-1/
│   │       ├── backend.tfvars
│   │       └── variables.tfvars
│   └── prod/
│       ├── eu-west-2/
│       │   ├── backend.tfvars
│       │   └── variables.tfvars
│       └── us-east-1/
│           ├── backend.tfvars
│           └── variables.tfvars
├── modules/
│   └── module-x/
├── aws-eks.tf
├── aws-rds.tf
├── aws-vpc.tf
├── backend.tf
├── datadog.tf
├── github.tf
├── outputs.tf
├── providers.tf
├── terraform.lock.hcl
├── terraform.tfvars # default values
└── variables.tf # variable declarations

# backend.tf

terraform {
  backend "s3" {
    # This is empty on purpose. Values come from backend.tfvars
  }
}

# envs/prod/eu-west-2/backend.tfvars

bucket         = terraform-state-<account-id>
key            = <env>-<region>
region         = <region>
encrypt        = true
dynamodb_table = "terraform_locks"

This is how to work with one particular env/region:

terraform init -backend-config=./envs/prod/eu-west-2/backend.tfvars
terraform apply -var-file ./envs/prod/eu-west-2/variables.tfvars

This is how effectively work with many env/regions and switch quickly:

export TF_DATA_DIR=.terraform_<env>_<region> — this instructs terraform to use a non default (.terraform) system subdirectory. Run init once for each combination of env/region.
Switch between environments/regions easily by changing TF_DATA_DIR env var.

This approach is production tested.
Again read about -backend-config flag here.

Benefits:

DRY principle — Do not repeat yourself. One entry point (directory) for working with all of them.
Versioning — Now you can use git to compare changes between environments, instead of comparing directories (which is hilarious). More about versioning is below
Easy debug — Now you keep terraform code flat (use no or few modules) and as a result you can access any resource’s value with terraform console without explicitly declaring output as you would do with child modules.
You still have flexibility to customize env/regions with count = var.foo_enabled meta argument .
Maintainability and Reliability eg less burnout and less bugs.

With Terraform cloud everything I mentioned above works out of the box.

The Terraform way

The official Terraform doc actually covers the structuring question.

Standard Module Structure | Terraform | HashiCorp Developer

Learn about the recommended file and directory structure for developing reusable modules distributed as separate…

developer.hashicorp.com

But it does not explicitly mention how to structure a root module for multi-environment — this is because when you use Terraform Cloud you switch between environments as simple as terraform workspace select <workspace> and this is not in their business interests to document an easy workflow for other backends.

Terraform Cloud

In Terraform cloud workflow you manage only 1 root module for all environments and you switch between them with terraform workspace select <workspace> cli command. The hustle with values for variables and secrets is taken away — they are stored in Terraform Cloud per workspace and are populated automatically.

In addition to that you get

History of plan/apply logs
State management and locking (no need to create S3 and DynamoDB)
Nice GUI
Policy as Code with Sentinel
500 resources are free.

Nether the less most of companies do not use Terraform Cloud (for whatever reason).

I choose Terraform cloud, because besides I am sort of frugal, I realize that a great tool is just a fraction of 1'developers salary and therefore worth it.

Neo fighting an exponentially growing system’s technical dept

Extra content

Number of tiers to have EVERYTHING-as-a-CODE

Tiers are layers of your infrastructure, like slices of an inverted pyramid.

Why the pyramid is inverted? Because Tier 0 and Tier 1 are the basement of the whole thing, if they are misconfigured — everything is crooked.
Also Tier 0 and 1 are has less number resources than Tier 2 — that is why the pyramid is inverted.

I recommend 3 tiers:

Tier 0 — Org level resources. Those which can not be replicated to non-prod environments and usually are organization-wide. Example: AWS Organization, ControlTower, IPAM, Github org etc.
1 root module
Tier 1 — internal platform resources eg centralized/shared resources for internal customers (delivery teams). These resources can be replicated to non-prod environments for testing purposes. Example: EKS cluster with ArgoCD, Route53 zones, RAM shares, monitoring stuff.
1 root module.
Tier 2 — delivery team’s resources. Every team owns a terraform repo and does self-service reducing a number of hands off.
Quantity of root modules = quantity of delivery teams.

Each root repo must be in a dedicated git repo — this is good for access control and simpler CI/CD config.

Tiers grow complex and Terraform can not create a plan until certain values are known. To workaround use -target flag when you apply a root module initially. Example:

terraform apply -no-color -auto-approve -target aws_vpc_ipam_pool_cidr_allocation.vpc
terraform apply -no-color -auto-approve -target module.vpc -target module.eks 
terraform apply -no-color -auto-approve

^ This is a better alternative than having many root modules.

Passing variables between Tiers

*this is a bit off the topic

Avoid hardcoding variables, read them with terraform data source.
Here are ways to do that:

Passing variables from upstream to downstream — use SSM parameters with RAM shares. You can configure an org-level RAM share.
Passing parameters in any direction — enable read/write SSM parameters in a shared place (tier 1).
For Terraform Cloud users that is simpler: pass variables to other Tiers in from one workspace to another, again using Terraform TFE provider.
Yes, you can manage Terraform by Terraform.

Versioning

Use environment branching strategy.
prod env is prod branch , dev env is dev branch etc.
This is how you should version — by branches, not by folders (as many mistakenly do with a root module for each environment). Now you can truly compare environments’ configuration.
Also this way you can be certain which version of code is deployed to each environment.

Remember I said

Switch between environments/regions easily by changing TF_DATA_DIR env var.

To be exact, you also need to switch branches.

Now you can use git to compare changes between environments, instead of comparing directories (which is hilarious).
More about env branching strategy I wrote here.

CI/CD

Let your CD agent to do production deployments. Do not apply manually in your console. To enfoce IaaC, restrict admin permissions to production for everyone except your CD agent.

The Deployment script can dynamically choose which environment to apply based on the branch name. Also it can dynamically run it for each directory (region) in ./env/<branch name>/

Tests

Application code requires unit tests (passing all possible inputs) because it may be used in unexpected ways.
Normally this is not true with terraform — we only care about inputs we pass in the current version and we test it just by applying dev environment.
But to ensure quality, you have to actually run a full cycle of terraform apply/destroy on a special environment (lets name it tests).

Eg in terraform you do not need to write tests for every possible scenario, you are interested only in your scenario.

If you are writing a public terraform module intended for the entire community — that is another situation. Indeed you have think out every possible input and use case. This is where terraform tests framework comes into arena.

Normally you do not need terraform test framework.

Child modules — no, no

Do not create child terraform modules to be cool and sophisticated. Our core value is maintainability. The simpler — the better.

Remember, to pass some value from one module to another you need to make lots of hustle.

define output in the module A
define input in the module B
in the root module pass to module A’s output to module B’s inputs.

Now imagine how unbearable slow this process would be if modules are in separate git repos.

Except

Create a child module when you find yourself repeating the same set of related resources more often than 3 times.

If the child module is intended only for the current root module, keep in in the same git repo and use a local reference.

module "api_gateway_x" {
  source = "./modules/api-gateway"
}

Because you use the branching strategy above, this way modules versioned along with environments.

It is a good idea to use well known public child modules like VPC, EKS by Anton Babenko.

With the knowledge above you have no need to use Terraform frameworks like Terragrunt or Terraspace.

Now, when you read this to the end, welcome yourself 0.1% club.
Thank me for this secret knowledge by sharing this article.

Please make sure you give this post **50 claps 👏🏼** and my blog a **follow ❤️** if you enjoyed it and want to see more.
Happy Learning! 🚀

What makes me authoritative to make such claims? I’ve read complete HCL documentation of Terraform (and Packer), Terragrunt and Terraspace. Used TF with different providers, supported various IaaC production envs. Passed Terraform exams in 2021, 2024.