Terraform Module Patterns

Aaron Kalair
8 min readOct 10, 2021

--

Terraform gives you a myriad of ways to organise your code, supporting almost any pattern you’d like to adapt.

Whilst flexibility can be nice, it can be really hard when you first start out to know if you’re organising your code sensibly or setting yourself up for difficulties in the future.

Here are some patterns for creating Terraform Modules that I’ve seen work well.

Configuring AWS Resources with Company Specific Defaults

The default configuration that Amazon believes to be acceptable, might not be the best for you.

Because of this, we create company-specific modules that wrap AWS resources and set defaults that make sense for us.

For example, we always want RDS storage_encrypted_enabled and S3 Public Access disabled.

So we create modules like company_name_s3_bucket that configure an S3 bucket with the company’s mandatory arguments and then allow the other arguments to be configured via input variables.

Then anyone who wants an S3 bucket uses the company_name_s3_bucket module instead of the Terraform s3_bucket resource from the AWS Provider and we never have to worry about people forgetting to set sensible defaults.

Automatically Creating Lower Level AWS Resources

Some AWS resources depend on others to be functional.

For example you can’t create an RDS instance without aws_db_subnets, so we have our company_name_mysql module create aws_db_subnets that are then passed to the aws_database_instance.

Now the user of the module doesn’t need to worry about creating these supporting resources and can just pass in the appropriate subnets they want the database created in.

Grouping AWS Resources Needed to Make Functional Services

We have Terraform modules that combine multiple AWS resources, and other Terraform modules to create functional services our applications need, one example of this is a MongoDB cluster.

Our company_name_mongodb_cluster module is organised as follows:

company_name_mongodb_cluster
-> mongodb_node_a -> company_name_mongo_node_module
-> mongodb_node_b -> company_name_mongo_node_module
-> mongodb_node_c -> company_name_mongo_node_module
-> cluster_security_group -> aws_security_group_resource
-> cluster_backup_schedule -> aws_dlm_lifecycle_policy_resouce
-> cluster_iam_policy -> aws_iam_policy_resource
-> cluster_iam_role -> aws_iam_role

With company_name_mongo_node_module creating:

company_name_mongo_node_module
-> cloudinit_data -> template_cloudinit_config_resource
-> ebs_volume -> aws_ebs_volume_resource
-> database_instance -> aws_instance_resource
-> dns_entry -> aws_route53_record_resource

Here we nest two modules we’ve written ourselves.

The company_name_mongo_node_module allows us to consistently create EC2 instances than run mongod without lots of copy and pasting

The company_name_mongodb_cluster_module allows us to create resources that are shared across the nodes like a security_group and iam_role that can then be passed into the child module which creates the actual MongoDB servers.

The user of company_name_mongodb_cluster simply passes in a few required parameters and gets a MongoDB cluster configured following our requirements with regards to availability (3 nodes), security (appropriately scoped IAM role, and Security Group isolating the cluster) and backup with the DataLifeCycleManager.

The module can also output the cluster endpoint that can be passed into other modules or given to an application.

How deep to nest modules

The above example uses two custom modules we’ve written and brings up another question.

How small and specific should your modules be and should you create long chains of modules with reusable components?

For example imagine:

networking_module ->
vpc_module ->
subnet_module ->
route_table_module

This at first seems quite nice.

Every environment you create will need a VPC, some Subnets and Route Tables, but what we’ve seen is that practically working with this can be painful.

If you want to update the route_table_module you then need to version bump every parent module (3 of them) to release that change.

If you want to pass route_table specific parameters down to the route_table_module you have to pass it through multiple other modules, even if they don’t care about them.

Similarly if you need to pass data back up as an output from the route_table_module you have to pass it through multiple layers.

Further this assumes a perfect world where every environment is identical and needs all of the resources.

In practice we find this not to be true and so the module ends up taking variables like enable_subnet_b that is passed into a count statement like this for the specific resource we only want to create in certain environments:

count = var.enable_subnet_b ? 1 : 0

This leads to the number of input variables the module takes growing quickly, as it will need to take inputs for the large number of resources it creates, and inputs to decide if it should even create the resource in the first place.

We’ve found this style of module quite hard to work with and no longer design them this way.

Where to store the modules

Once you’ve got some modules you’ll need a place to store them so you can reference them in your Terraform runs.

Initially we had a monorepo that stored all our modules under the modules/ directory and our actual instatiations of the modules were under environment specific directories in the same repo e.g.

environments/
dev/
vpc.tf
prod/
modules/
vpc/
mongo/

And reference the modules with …

source = "../modules/vpc"

This works well when you first start out.

You can super easily iterate on modules and make changes but it quickly causes issues.

As all of the modules are unversioned and live in the same Git repo, it’s impossible for modules in prod/ to use a different version of a module to dev/.

Of course you can use -ignore or tell people not to run apply in prod/ as you’re testing changes, but this is an accident waiting to happen.

So you’ll want to move your modules out into their own Git repos

Once they’ve been split out you have two ways to release them.

Git Refs

You can use Git refs in the source field to reference a module e.g.

source = "github.com/hashicorp/example?ref=<BRANCH OR TAG>"

If you tag each merge to master with a version number you can then easily control what version of the module is deployed in each environment.

This even works if your modules are in a private git repo, the system running Terraform just needs access to the private repository either with an SSH key or username and password.

This is by far the best way I’ve seen to store and use Terraform modules, mostly due to the downsides of the Terraform Module Registry discussed next.

Terraform Module Registry

The Terraform module registry can import modules from Github and allows you to reference them like …

source = "app.terraform.io/<YOUR COMPANY>/<MODULE>/<NAMESPACE>"
version = "~> 1.1.0"

It’s free for 5 users and they have paid plans for additional users.

The biggest advantage of the Terraform Module Registry is fuzzy version locking e.g. ~> 1.1.0 means the latest minor version from the 1.1 tree.

So without having to change any code you can run terraform init -upgrade and pull any bug fix releases from the module registry.

We initially switched from Git references to the module registry for this fuzzy version locking, but ultimately it turned out not to be used that often and we prefered to manually control any version bumps.

Our initial need for the fuzzy version locking was for our long chain of nested Terraform modules described above. In reality that pattern was a bad idea, and as we migrated away from it, our need for loose versioning lessened.

It’s important to remember that with the version loosley locked like that everyone needs to have run terraform init -upgrade at the same times to ensure you all use the same versions of the modules, and someone else running terraform apply doesn’t undo changes.

We use the free plan for the Terraform Module registry and our daily job which builds a reference version of our Terraform modules frequently fails because either the Terraform module registry is down or we hit API limits.

Also as you don’t want to put untested module versions into the registry you’ll find yourself using the Git Refs still to test changes to a module on a branch you push to Github to verify it works before merging.

Finally, it’s worth noting that if you plan to use Terragrunt it doesn’t support the Terraform module registry.

Bash scripts as modules

Terraform supports data resources which can be a variety of different things such as AMIs or EC2 Instance UserData.

You can do neat things with this like creating modules for UserData that you need to run as part of the Cloudinit process when an EC2 instance comes up.

For example, if you need to kick off a chef-client run when your EC2 Instance comes up, you can create a module that defines a templated bash script like …

… <assorted setup work>
chef-client -E ${environment} -N $node_name …

And output it from a module like ..

output "file" {
value = templatefile(
"${path.module}/chef.sh.tmpl", {
environment = var.environment
}
)
}

Then every EC2 instance that needs to run it can use it to build up their template_cloudinit_config

module "chef_userdata" {
source = "<MODULE_SOURCE>"
environment = "var.environment"
}
data "template_cloudinit_config" "service" {
part {
filename = "chef.cfg"
content_type = "text/cloud-config"
content = module.chef_userdata.file
}
}

Now you have a fully versioned set of scripts you can use to bootstrap your instances, and you don’t need to copy paste them around everywhere!

Assorted Ad-hoc resources for applications

Aside from the larger resources, like databases and caches, your applications may also need an assortment of “smaller” things like IAM Roles, SSL certificates or random strings for secrets.

You could just define these as individual resources in app.tf files inside your Terraform environment and terraform apply them e.g.

resource "aws_acm_certificate" "app" {
...
}
resource "random_password" "password" {
length = 64
}
resource "aws_secretsmanager_secret" "secret" {
name = "password"
}
resource "aws_secretsmanager_secret_version" "secret" {
secret_id = aws_secretsmanager_secret.secret.id
secret_string = random_password.password.result
}
resource "aws_iam_policy" ""{
...
}

But now when you want to create those same resources for your application in say prod you have to copy paste them all over.

If you change any of them in dev you need to remember to exactly reproduce the changes in prod .

Finally, if you use Terragrunt you can’t just define random resources at the top level like this.

So we decided it was better to adopt a pattern of app-resource modules.

If required, each app gets a module called terraform-<BUSINESS_NAME>-app-resources-<APP_NAME>. In here you place all of these assorted resources and then you instantiate the module in your terraform environment instead of defining all the separate resources.

Linking Terraform created resources with your applications

Once you’ve created your various AWS resources in Terraform, it’s likely your applications will need to know something about them, like S3 Bucket Names, Database DNS addresses etc.

You could manually copy and paste these into your application settings file, or take advantage of Terraforms ability to create AWS Parameter Store and Secrets Manager entries to make this a less manual process.

E.g. Our Database module creates Parameter Store entries for the DNS address of the database…

resource “aws_ssm_parameter” “dns_address” {
name = /dev/app-a/database_address
type = “String”
value = database_module.outputs.dns_address
}

And then you can either use something like ExternalSecrets to fetch that data and inject it into a K8s pod, or have your app talk to the AWS APIs itself to retrieve the values.

--

--