Terraform without wrappers in multi-region, multi-account environments

Anton
4 min readNov 7, 2022

--

Over the use of terraform for different projects we did a major code-refactoring at least 5 times. We started with terraform version 0.7 as an alternative to CloudFormation which supported only json back then.

First, we had a single project with a minimal functional separation, with different .tf files within same folder, without any structure. On the very first iteration we even had a state file as a part of the git repo. Very quickly we realised, that we need a way to reuse a code blocks and share the state file with all team members. We started to use modules, remote state files and symlinks to map global variables between modules and project folders.

Next iteration for us was a makefile wrapper which mirrored all major terraform commands as a parameters to make command. I know some folks are still using very similar approach with a shell-script wrapper.

Unix make command requires to have a Makefile in current working directory. Conveniently we could use the Makefile as a flag that terraform command is allowed to run here. Our hope was to reduce a potential harm if wrapper will be triggered from the non-project folder. Approach with a wrapper quickly gave us an ability to use pretty complex logic in background: we were able to dynamically inject different variable files (.tfvars), create new environments on the fly with git worktree, backup remote state files, etc. One of the main limitation of Makefile wrapper was an uncommon syntax and limited number of supported terraform commands.

Next iteration was terragrunt. Idea of the reusable environment looked pretty promising but the way how it was realized just didn’t fit into our understanding of how terraform repo should look like.

There were noticeable lack of flexibility, tricky way to use find_in_parent_folders functions with unclear self-reference of local blocks defined on global level. On top of that terragrunt had the following issues for us:

  • drift between most recent version of terraform and currently supported/tested version by terragrunt
  • maintainers released a document were they were discussing 3 different ways to reuse nested global variables and each of them had a drawback, according to themselves
  • only single terraform block with source within the environment was allowed

We didn’t like the idea to have a hard dependency on the additional open-source tool which is not needed by the main tool, we want to use. And we moved forward.

We realised that one of the main limitation terraform has for us was an inability to use variables for backend configuration. This limitation seriously breaks DRY principle which we want to follow, especially when you plan to use terraform to configure multi-region and multi-account environments. The only way to bypass this limitation and have reusable code was the use of TF_CLI_ARGS_* environment variables. The usage is pretty similar to the makefile wrapper but we wanted to have an ability to use terraform binary, switch between different versions with tfswitch and do not provide any additional command-line arguments without real need.

After the short research and test we found that direnv works perfectly. We have main .direnv file which is about 50 lines of code. It sets TF_CLI_ARGS_plan, TF_CLI_ARGS_apply, TF_CLI_ARGS_import, TF_CLI_ARGS_destroy, TF_CLI_ARGS_console, TF_CLI_ARGS_refresh and TF_CLI_ARGS_init variables according to the $CWD. This .direnv file is a complete replacement of terragrunt for us.

More info about how exactly we are using direnv with terraform can be found here.

The overall repo structure is pretty straight-forward. We’re using the concept of splitting variables definition from functional code. Variables folder has a bunch of .tfvars files with nested variable maps like:


eks_clusters = {
production-cluster-v1 = {
...
}
service-cluster-v2 ={
...
}
...
}

This allows us to keep our infrastructure definition consistent and amount of used variables small. Within root module we’re using local loops to construct different nested loops which can be passed to own and community modules:

local {
groups_to_permission_sets = flatten([
for group, config in var.sso_groups : [
for aws_account, v in config : [
for permission_set in v.sso_permission_sets : {
name = "${aws_account}.GROUP.${group}.${permission_set}"
principal_type = "GROUP"
permission_set_arn = aws_ssoadmin_permission_set.this[permission_set].arn
permission_set_instance_arn = aws_ssoadmin_permission_set.this[permission_set].instance_arn
principal_id = aws-sso-scim_group.this[group].id
account_id = var.aws_accounts[aws_account]["id"]
}
]
]
])
}
resource "aws_ssoadmin_account_assignment" "this" {
for_each = { for assignment, config in concat(local.groups_to_permission_sets, local.users_to_permission_sets) : config.name => config }
...

If we decide to change community module, top level definition in .tfvars will stay the same. Such structure allows us to provision similar components almost in no time. Most of our developers are pretty familiar with syntax and can safely use existent configuration as a snippet for something new.

Each infrastructure component uses definition from the tfvars as a name for the corresponding workspace. We’re parsing workspace name to be able to pass additional variables when needed with:

split(“@”, terraform.workspace)

Instead of default values for the variables we’re using try() and can() functions a lot. With 1.3 we hope we’ll be able to use more precise definition for variable maps.

Regions and accounts are switched with aws-vault. After the switch environment variables have temporary credentials and current region name, which is used in workspace_key_prefix.

Proposed structure works pretty well for us for more than one year. At the time of writing we’re managing 25 AWS accounts in 3 regions with it.

Thanks for reading!

--

--