Simplifying Terraform Plan Reviews with Generative AI

Published in

Insider Engineering

4 min readJun 30, 2024

DevOps engineers love the Infrastructure as Code (IaC) methodology as it allows them to version the infrastructure, enables review of change requests, and automates the resource provisioning. However, sometimes software engineers do not have the same feelings towards writing and reviewing infrastructure code as it may become cumbersome for them.

Terragrunt for Infrastructure as Code

At Insider, we have been using Terragrunt as our Infrastructure as Code tool for years. Our DevOps team integrated Terragrunt with Atlantis, allowing software engineers to request AWS resources by creating pull requests on GitHub. This integration has been widely adopted, resulting in thousands of pull requests over the years. Read more about that integration here if you are interested.

Over time, our software engineers became proficient in writing Terragrunt code for resource requests. However, reviewing these changes remained time-consuming, especially with lengthy pull requests involving multiple files. Although reading the plan output beforehand can help, these outputs also can generate long and verbose YAML files that are difficult to interpret for humans.

Generative AI for Plan Summary

At Insider, machine learning has been at the core of our work for years. We have been using ML/AI technologies for predictive segmentation, recommender systems, product search and lately, improving user efficiency and productivity through generative AI.

While designing and developing our generative AI solutions, we realized that we could also boost developer productivity by improving internal processes, like reviewing Terragrunt outputs. We developed a prompt to analyze and generate a human-friendly summary of the Terragrunt plan. This allows users to quickly grasp the changes before diving into the detailed plan and code.

AI-generated summary for an AWS CodeBuild project and its dependencies.

As with any prompt engineering process, it took us several iterations to achieve the desired output. We utilized Amazon Bedrock as our foundation model service to keep our Terragrunt plan outputs within the AWS network for privacy and security concerns. After experimenting with several models, we chose Claude 3 Haiku due to its cost-effectiveness, speed, and quality of summaries.

It can also be helpful for engineers when they encounter an error in the plan. There is room to improve here.

The Integration

We collaborated with our DevOps team closely during the prompt engineering and integration processes. Our team was already providing an internal LLM gateway to govern the model accesses for our products, therefore it was as easy as adding a single HTTP request with plan outputs to integrate it into the Atlantis deployment. It would be still easy to integrate with Amazon Bedrock directly by using AWS SDK, but we wanted to keep track of the usage and, therefore, we did integrate it into our existing infrastructure.

The Prompt

LLMs are quite good at summarizing tasks, often without a complex prompt to achieve a promising output. You can start with a basic prompt and adjust it to your preferences. Below is an early version of our prompt, which you can refine to suit your needs. Read more about prompting techniques before starting and simply experiment in the Amazon Bedrock Chat Playground to perfect your prompt before implementation.

You are an expert DevOps engineer working on Terragrunt. You are given the output of a Terragrunt build plan and should explain it to a developer who is not an expert on Terragrunt. Follow the procedure below.
1. Examine the plan output.
2. If there is an error, explain the error and the reason. In case of error do not continue.
3. If there is no error, simply explain what will be executed. Which resources are affected, what are the exact actions etc.
The output should be concise. Provide a summary of affected resources.

Conclusion

Infrastructure as Code is a powerful DevOps pattern that enables organizations to manage cloud infrastructure effectively. However, involving software engineers in IaC practices can be challenging and time-consuming depending of the skill set of the teams. Leveraging LLMs to generate human-friendly summaries of changes can increase the developer productivity by helping them to quickly grasp the overall changes before reading the details. The main drawback, as with every LLM-generated text, is that the summary should not be taken as a fact but as a guide, since the AI-generated text may not be 100% factually correct while being syntactically correct.