Using Terraform at Scale

How to find your security group in a haystack

Picture by Ronald Bow

Terraform is a tool developed by Hashicorp that enables you to declare your infrastructure as code:

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently.”

We use Terraform at Hootsuite to spin up or terminate any AWS (Amazon Web Services) resource such as EC2 instances, S3 buckets or IAM roles. Terraform has support for many providers including other public cloud offerings such as GCE (Google Cloud Engine) or Azure. A whole list of providers can be found here:

When a Terraform plan is applied, it creates a Terraform State file (.tfstate). A Terraform State file is a file that stores “state about your managed infrastructure and configuration”.

At Hootsuite, we are in the process of migrating all our AWS resources to be managed by Terraform. Why do we want each resource to be “Terraformed”?

  • Auditable — have a record of who created or deleted resources and the changes they made
  • Maintainable — long term maintenance of configuration possible with versioning
  • Repeatable — common infrastructure can be turned into Terraform modules and be used in multiple Terraform stacks
  • Scalable — configuration modification can be done for multiple instances in one location
  • Documented as Code — the code describes what your infrastructure looks like
  • Peer Reviewable — changes can be reviewed before applied

The Problem

Each service or project at Hootsuite has it’s own Terraform stack that defines all the resources needed for this service. Many organizations use a mono-repo approach with all their Terraform modules defined in one repository. Since we have broken up our Terraform resources per service, we have hundreds of Terraform State files as opposed to one really big Terraform State file. We have divided our Terraform modules in this way because:

  • Separation by service — Each Terraform repository only defines the resources needed for that service
  • Teams are able to work simultaneously — A single repository would need to be “locked” when someone is making changes so that resources aren’t accidentally deleted or overwritten
  • Encourages service ownership — The Terraform repository belongs to the developers maintaining the service

Many of our old services had their resources spun up directly in the AWS User Interface or by other tooling and are not being managed by Terraform. The only way that we could know if an AWS resource had been Terraformed or not was to find the resource definition in the Terraform code or to find the resource in a Terraform State file. We are are the point of migration where we have hundreds of Terraform stacks (each stack has its own Terraform State file). Manually looking through Terraform code or through Terraform State files was not a very good way to know if an AWS resource has been Terraformed or not.

The Project

We wanted a way to be able to quickly find a resource definition, therefore, I implemented a tool in Go that is able to search for resources in Terraform State files.

The tool, called Terraform State Search, is a REST application that has the ability to:

  • Read Terraform State files from an S3 bucket
  • Search through all Terraform State file in S3 bucket given user input
  • Maintain the same state as the S3 bucket by queuing bucket notifications with SQS

Part 1 — Read from S3

You can configure where to send your Terraform State files by specifying the backend, “a “backend” in Terraform determines how state is loaded and how an operation such as apply is executed”. At Hootsuite, we specify in each Terraform stack that the backend is S3, therefore, all projects send their Terraform State files to S3 (all to the same bucket).

Here is how you would define the backend in Terraform:

Terraform {
backend "s3" {
bucket = “sample-bucket
key = “path/terraform.tfstate”
region = “us-east-1”
role_arn = “arn:aws:iam::XXXXXXXX:role/sample-role”

We needed a way to fetch state files from S3 with a given path. The implementation is pretty simple. The AWS API (Application Programming Interface) ListObjects is called, and returns all the files in the specified S3 bucket.

# s3client.go
s3response, err := svc.ListObjects(&s3.ListObjectsInput{Bucket: aws.String(“bucket-name”)})

The files returned are served to the user. The user is able to specify the file they would like to view by path and the AWS API GetObject is called that returns the file content (in this case the contents of the Terraform State file).

# s3client.go
result, err := svc.GetObject(&s3service.GetObjectInput{Bucket: aws.String(“bucket-name”), Key:aws.String(“path”)})

Part 2 — Search

Once we had the Terraform State files, we needed a way to index them that would allow users to search through them.

The library that I used for searching is called Bleve. Bleve provides a Lucene-like search which provides a way to:

  • Create Search Indexes
  • Search for content in the Search Indexes created

A Search Index is a representation of searchable data and it allows you to retrieve items from persistent storage. Bleve optimizes search time through the use of indexes, instead of going through the full text.

Bleve also provides other tooling to help make a more refined search or to receive better search results, such as analyzers and scoring. Analyzers only index items that specify the analyzer definition. For example, an English analyzer will only index English words. Scoring is a way to weight different query results. Bleve also provides a CLI tool that is very helpful for debugging.

A note on Terraform Resource vs. Data Source:

“Resources are a component of your infrastructure” . Resources are where services or infrastructure is defined. “Data sources allow data to be fetched or computed for use elsewhere in Terraform configuration”. Data sources are used when you want to use a resource that was defined in another project. For our use case, we were more concerned with where resources were defined and created as opposed to where they were used. To do this, it was better to index only resources and not data sources so that the search result would return only the file that the resource was initially created in.

When the app starts, it does a batch download of all the files in the bucket containing all the Terraform State files, then indexes each file into a predefined Bleve index. It then parses the JSON, and stored information that needed to be indexable in a struct (tfstruct). Then the struct is stored in the Search Index with bleve.Index():

# search.go
tfstruct := Tfstate{
Name: name,
ID: id,
Content: data,
Path: path,
bleveIdx.Index(uuid, tfstruct)

Once the Terraform state files have been indexed, then querying is possible on the Bleve Index:

# search.go
query := bleve.NewQueryStringQuery(searchTerm)
search := bleve.NewSearchRequestOptions(query, limit, 0, false)
searchResults, err := bleveIdx.Search(search)

Part 3 — Listen for changes

To keep the state of the service the same as the state in s3, bucket notifications were added for any files updated or deleted in the bucket. The bucket notifications were put in an SQS queue and the program does long polling on the queue in a loop. If there is a bucket notification in the queue, the Search Index updates accordingly.

Setting bucket notifications in tf:

resource "aws_s3_bucket_notification" "bucket_notification" {
bucket = "${var.bucketName}"
queue {
queue_arn = "${aws_sqs_queue.stash.arn}"
events = ["s3:ObjectCreated:*", "s3:ObjectRemoved:*"]

Long polling in a loop:

# sqs.go
for {
err = longPoll(svc, resultURL)
if err != nil {
log.Errorf("Error Processing Queue Msg. Error: %d, trying again", err)


We are in the progress of migrating all our infrastructure to Terraform. At scale, it is difficult to know what services have been Terraformed and which haven’t. With the Terraform State Search, you can easily search if certain resources have been Terraformed or not. Having the ability to search if resources have been Terraformed or not allows developers to know what services have been migrated to Terraform and which ones have not been migrated yet.