Terraform does an amazing job in helping one define their cloud infrastructure. When executed, it parses all the terraform files (*.tf, *.tf.json, *.tfvars, *.tfvars.json) in the current working directory (without recursion) in order to generate its internal graph of what it thinks your infrastructure resources should look like. It then determines the location of and retrieves your current state, compares it with the desired state, and prompts you with the diffs to be applied to reconcile the two.
This provides very easy usage for those new to terraform, or those doing simple things, but in most real world usage of terraform you are going to end up needing some better way of organizing all your inputs to terraform. Ideally you’d want to do this in such a way as to make it easy to share as much or as little as necessary between the different organizational partitions in a way that is easy to change over time. Sharing everything or nothing is easy, but sharing only what you need, on a sliding scale over time, becomes a lot trickier. You can use terraform conditionals to account for some of the simpler differences, but limitations in the terraform language make it very difficult to do so for complex variation. Terraform workspaces (previously called environments) can also be an option if your backend is supported, but the “activation” style of usage is not for everyone. Modules can help to at least abstract away some of the complexities of conditionals but again, language limitations make true reusability difficult — hopefully terraform v0.12 will make that better.
The first level of organization you may need are large scale partitions where almost everything is the same. These are typically called Environments in other frameworks, and are usually used to separate development from staging and production. Many people use terraform workspaces for this.
Another level of organization revolves around breaking up a monolith into Subgroups. This should be avoided for smaller deployments as it is much easier to work with a single state and thus one big picture. This is especially true in the early days of defining your infrastructure as you tend to do rapid iteration with large crosscutting changes. However, as your system and organization matures it starts to become unwieldy:
- It takes forever to refresh and run for all resources when you only care about changing a smaller set of things (leaf nodes in graph).
- Contention for the shared state file lock amongst multiple developers
- Hard to work in isolation on a feature branch due to contention wen applying diffs.
When these issues start to become burdensome, one can break up an environment into smaller partitions, each with its own state. For example, one for core infrastructure like your vpc, another for shared databases/components, others for each of your applications or services. The terraform remote state resource can be used to reference output variables in the state for other groups, for example to get something like a route53 zone_id from the “core” partition when registering a dns name for a load balancer in your “services” partition. It is a good idea to do this in a tree-like fashion and avoid cycles between your groups.
Finally, as your organization grows, it may also become beneficial to push some of your infrastructure definition out to the edge by making each component in your system also responsible for defining its infrastructure. You could use Subgroups for this if you prefer to keep all your infrastructure centralized, but could also push the terraform source for the component out to the component’s repository, and manage it independently for that component.
Given these levels of Organization, how does one actually layout their terraform repo to account for them? However you want to organize things, it has to all boil down to running terraform in a directory that has the source files for the specific partition that you are currently running it for. Ideally you would also organize things so that you can start simple, and only add the extra organization/complexity as you need them.
The simplest implementation is to organize your directory tree with a directory per partition. Then have a wrapper script which you can parameterize for your partitions so that it can place you in the correct directory when running for that partition. The drawbacks are:
- You end up with a lot of directories that can be hard to keep track of — N environments * M groups can be a large number!
- It is difficult to share terraform source across partitions. You’ll need to push as much as possible into modules, which have their own reusability limitations.
- Your wrapper script can get quite complex, and then you have to figure out a way to share that runtime for use across multiple team members and repos.
The biggest issue here is the difficulty in sharing source across partitions. Regardless of how awesome your modules are, you are still going to need some source to instantiate the use of those modules based on the variables active for that partition, and possibly glue those modules together. You can write modules to glue modules as deep as you’d like, but you’ll always need something at the top. There is some value in keeping that tree shallow — simpler, more visible, easier maintenance, especially when you need to start manually moving resources around in the state so you don’t have to tear them down for a refactor.
Thus, in order to help with sharing, one is going to need some top-level files to be the glue. For lack of a better term, let us call these Recipes. A Recipe is simply a terraform source file that exists independent of the partition in which it is used. Thus to share it across partitions (directories), you can follow any of a number of strategies:
- Copy each recipe into a partition as needed, and check it in there. Allows independent modification, but reconciling updates are tough as it is essentially a fork without tracking.
- Symlink each recipe into the checked in tree that represents your partitions. Always up to date, but forking requires a copy with all the problems it represents. You can avoid forking by getting creative with your use of variables/conditionals/count like you would with a module, but you’ll eventually hit a wall due to language limitations. Depending on your VCS, it can also be a little tricky to track changes over time, and symlinks themselves can sometimes be problematic.
- Maintain a mapping file of recipes to the partitions they are active in. Easiest to track changes over time (file diff), at the expense of the wrapper logic complexity used to realize the terraform working dir for the partition you want to run for.
Once shared, the main problem with recipes is that they don’t have the explicit nature of modules for output variables, so cross referencing between them is a little loose — in effect you reference the outputs of modules instantiated by them and possibly locals as well. However, since they are mainly acting as a glue layer, this isn’t that big a deal in practice as terraform quickly tells you when a reference is broken.
Handling partition variation
Along with partitioning your terraform comes the need to customize what happens for each partition. By using a file based glue layer (Recipes), you maintain the ability to create a file specific to a partition, giving you the ultimate escape hatch. However, for more common variation you should strive to handle that variation with terraform variables so that your recipes can be truly reusable across your different partitions. To this end, you should structure your variables so that they have sensible defaults, but are easy to override per partition.
More than a wrapper script
Wrapper scripts start out simple, then quickly become a substantial engineering effort as they address the above organizational goals. If you’d rather not go down that path, you should check out atmos:
- Gives you the ability to have Environment and Subgroup partitions without hijacking Terraform Workspaces
- Maps Recipes to each partition based on an easily edited yml data structure, thereby allowing your partitions to be as similar or different as needed.
- Makes it easy to extend or override variables for Environments from yml. By default it passes all top level yml keys into terraform as variables, though you have to declare the variable resource in a terraform file to make use of it. It may be extended in the future to auto declare the terraform side if not present.
Here is an example yml file showing how one maps recipes across multiple environments:
# The recipes used for all environments
recipes: # The bootstrap subgroup for setting up an account/environment
- grant-ops-access # The default subgroup containing the recipes to run your system
# This can be broken up into as many other groups with whatever
# names you desire
# Overriding recipes for specific environments. You can have as many
# environments with whatever names you desire
# Replace the default subgroup for the "ops" environment
# so that it doesn't provision the usual recipes, just
# those needed to manage operations
# Add to the default subgroup in the prod environment
# Add to the default subgroup in the dev environment
Selecting an environment or group for a terraform apply simply becomes a matter of passing in command line arguments to atmos, e.g.:
atmos -e dev apply --group tools
If a group is not supplied, it uses the
default group, so you can easily ignore grouping till it is needed.
Give it a try and let us know what you think.