Modularizing Common Infrastructure Patterns in Terraform

How to use Terraform modules to minimize code duplication in your IAC projects.

Published in

Ancestry Product & Technology

7 min readDec 3, 2020

Background

When developing large, complex software systems, it is up to the developer to identify pieces of code that can be grouped together into classes or modules so that they may be extended and re-used throughout the project. The same idea can be applied when developing infrastructure using Terraform. Terraform modules provide developers a mechanism for grouping multiple resources together so that they may be used in many places in an IAC project as one single component, thereby reducing code duplication and maintenance.

Example

At Ancestry, genomic data is processed through various algorithms to provide customers with insights about their DNA. Many of these algorithms benefit from being run on multiple DNA samples at once. While developing the infrastructure for Ancestry’s genomic algorithms pipeline in AWS with Terraform, we identified the need for a re-usable component that would allow us to batch these DNA samples together as they entered the pipeline to later be sent to another process once a threshold was reached. We would need to group the following components together:

An input SQS queue for receiving individual samples (simply as DNA sample identifiers or sample_ids)
A lambda function for implementing the queueing/dequeuing/batching logic (will be triggered by the input SQS queue)
An IAM role for executing the lambda function
An output SQS queue for sending batched samples (as a single list of sample_ids per message)

The final piece needed for this component would be a temporary data-store for queuing samples, checking how many samples are in the data-store, and dequeueing them for a batch once the count passes our configured threshold. At first glance, the original input SQS queue seems like it would suffice. However, SQS only returns an approximation of the number of messages in a queue at any given time, which is not a reliable source of information for this component. Given that we already use Redis as a multi-purpose data-store, we decided to leverage it for our component.

The architecture for our module would look something like this:

Implementation

To build a Terraform module in your project, the first thing you need to do is create a sub-directory in your project’s root-level directory. Terraform scans all sub-directories in a project and identifies them as child modules. They will not be initialized until you use them in any of the .tf files in your root module. We will create a directory and call it sample-id-batcher.

Variables

Terraform modules provide a mechanism for configuring input parameters for your component called variables. Think of them as initialization parameters for an instance of a class. Before developing our component, we need to decide which configuration pieces we want to make variable to make the component flexible while also maintaining simplicity. For our example, we need two input parameters: the batch size and the URL to our Redis instance. We will also add a third parameter to help with naming our components in a meaningful way. In our module directory, let’s add the following variables.tf:

Locals

Locals are used in Terraform to define constants for your module. We can use them to store the results of string interpolations for component names, timeouts, and other information. In our module’s directory, let’s add the following locals.tf file:

Outputs

Outputs are used in Terraform to expose information about a module that can be used by other components for integration. In our case, we would need the URL of the output SQS queue to be exposed so that we can use it to kick-off another process via an SQS subscription. For flexibility, let’s also expose the ARN of the output SQS queue. We will add the following outputs.tf file to our module:

The SQS Queues

For our use-case, we will need two SQS queues: the input SQS queue for receiving samples and triggering our lambda batcher function, and the output SQS queue for placing batches of samples when the threshold has been met. We will add the following sqs.tf file to our module’s directory:

The Execution Role

All AWS lambda functions need to be configured with an execution role. The role should have access to all the resources used throughout the lambda’s execution. For our use-case, our execution role would need the standard permissions that all lambda execution roles need as well as permissions that allow it to pull from the input SQS queue and post to the output SQS queue. Let’s add the following role.tf file to our module’s directory:

As you can see, most of the permissions are standard, boiler-plate policies that AWS requires you to add in order for your lambda functions to work. The policy called lambda_execution_role_sqs_policy is where we define ACLs for allowing the lambda to pull from the input SQS queue and post to the output SQS queue.

The lambda function, trigger, and data archive

The final infrastructure piece that we need to create is the AWS Lambda function. We will pass along certain variables and constants in the environment block of our lambda function resource to make them available to our lambda code. In addition, we will need to create an event source mapping which will trigger the lambda any time a new message is posted in our input SQS queue. Finally, we will need to declare an archive file where the Python code and all of its dependencies will live. Let’s add the following lambda.tffile to our project:

As you can see from the data block that defines the archive, we will need to place all of our Python resources in a folder called packagein our batcher module’s source directory. This package will include the main Python module that we write to drive our lambda function as well as all the dependencies our lambda function has. In this case, Redis is the only dependency we will need to include. More information about how to build a Python deployment package for AWS lambda can be found here. If your lambda function does not require any external dependencies other than what is available in the Python execution environment provided by AWS, you do not need to worry about building a deployment package with dependencies. This process will differ if your lambda is using any other execution environment such as NodeJS, Java, etc. Please consult the AWS documentation for the language you are using.

One other important detail to note is that our lambda is set with a concurrency of 1. This will ensure that our lambda can only process one sample from our input queue at a time, guaranteeing that we do not run into any race conditions during the queueing/dequeueing process.

The Python code for the lambda

Now that we have all of our AWS infrastructure resources declared, we need to create the Python module that will drive the functionality of our Terraform module. When executed, our lambda function will do the following:

Parse individual sample_ids from Records in the incoming SQS message.
Push the individual sample_ids from the messages into Redis
Check if the number of sample_ids in Redis exceeds the configured batching threshold and, if so, remove all of the sample_ids from Redis and push them to the output queue as one batch.

The following Python module should do just that:

After creating all of the resources mentioned above, our project directory should look something like this:

NOTE: main.tf is in our root module and will be where we use our new batcher child module.

Using our new module

Now that we have defined all the building blocks for our module, we can use it anywhere in the root-level of our project by calling it using a module block and supplying it with our desired input variables. For example, let’s say we wanted to created a batcher to queue up samples until they reach a threshold of 100 and then send them to a process called “ethnicity”. As noted above, we will be using our new module in main.tf . We initialize our batcher as follows:

NOTE: Storing connection strings in source control is inherently unsafe, but we are doing it this way for the purposes of this tutorial. Do NOT do this in practice.

With our sample-id-batcher module created, we can then use the module’s output variables to integrate with other processes that will take the batch of sample-ids as input. For instance, we could create an SQS subscription to some hypothetical lambda function called calculate-ethnicity.

When this Terraform gets applied, it will result in the creation of multiple components that will work together as follows:

An SQS queue named ethnicity-sample-id-batcher-in will receive messages containing single sample_ids.
A lambda function named ethnicity-sample-id-batcher will be triggered via a subscription to the SQS queue mentioned above, executing all of the queueing/dequeueing logic we’ve coded in our Python module.
An SQS queue named ethnicity-sample-id-batcher-out will receive messages containing 100 sample_ids when the lambda mentioned above posts them to it (after the threshold has been reached and the samples have been removed from Redis).

Following that, our theoretical calculate-ethnicity lambda will be triggered with an SQS message containing the batch of 100 sample_ids.

Conclusion

Terraform modules provide a mechanism for grouping together low-level AWS resources into high-level components that can be re-used throughout your project. You can make them as flexible or as rigid as you like, but the key is to design them to meet your needs while reducing code duplication and complexity.