Navigating the Testing Maze: Unravelling the Challenges of Infrastructure as Code (IaC) Testing with Terraform

Lukasz Pawlega
Palo Alto Networks Developers
13 min readSep 7, 2023

--

Photo by Markus Spiske on Unsplash

Quality assurance is a critical aspect of the software development lifecycle, guaranteeing the delivery of a reliable and functional product. It ensures that the code is of high quality, performs as expected, and meets the desired standards. It also improves security through identifying, assessing, and mitigating risks…

This is all true. It seems obvious when you think about code written in one of the general-purpose languages. A developer can even almost naturally assign a particular test to the requirements mentioned above: code standards — static code analysis, reliability — unit/integration testing, etc. Yet, this is not so obvious when you switch to declarative languages and tools such as HCL and Terraform, or in general, when talking about testing Infrastructure as Code. The image gets blurry, and you suddenly end up in a situation where testing one line of code means deploying a whole costly infrastructure.

Why? Let’s use Palo Alto’s Next Generation Firewall Terraform modules repositories as an example.

Testing Infrastructure as Code

First, let’s answer the question if traditional test types match Infrastructure as Code (IaC).

The code in the mentioned Terraform repositories consists of the following:

  • Modules (not deployable directly, reusable code),
  • Examples (deployable, built of modules, describing whole infrastructure).

Below are testing levels typically used in general-purpose languages. When you start to assign Terraform code components to these tests (except static code analysis), you will immediately start seeing problems.

So how do we test Terraform code or, in general, IaC? — We deploy it.

If we take unit testing into account: in our case, the smallest entity is a module. To test it, we should deploy it. But a module is not deployable on its own. Furthermore, it often relies on outputs from other modules. If we combine more than one module to do unit testing, it is no longer a unit test but an integration test. So we just lost unit tests.

As to integration tests, in our case, we already have code that combines several modules; these are examples. So instead of writing Terraform code to do integration testing, we can accomplish it with examples. Then, it becomes system testing.

And since an example is deploying the whole infrastructure, how is this different from end-to-end testing?

We can immediately see that when testing infrastructure, the boundaries get blurred. Furthermore, it seems that testing is deploying, and unit tests are actually end-to-end tests. In other words, we’re down to deploying examples.
We are talking about all of the examples because we want to make sure that all code is tested. Speaking of deploying, our code typically supports 3 to 5 Terraform versions. To make sure that the code is running correctly in all supported versions, we should deploy the examples using every one of them. As you see, this becomes a nightmare.

We can overcome this challenge by two means. First, by imposing boundaries in our own way, and secondly, by automating tests.

Effective Testing Strategies

When discussing testing strategy, it is essential to first address the concept of boundaries, as the strategy’s effectiveness heavily relies on how we define and distinguish between different types of tests.

Static Code Analysis

This types of tests are the easiest in Terraform. We treat HCL like any other language. There is a set of methods and tools you can use to run SCA. Let’s just focus on the ones we use:

  • TFLint — a Terraform linter. It helps catch common mistakes, deprecated syntax, security vulnerabilities, and other potential issues early in development.
  • Terraform FMT — a built-in command that automatically reformats Terraform code files into a canonical format, adhering to a consistent and standardised style throughout the codebase.
  • Checkov — an SCA security tool that detects security and misconfiguration problems.

To run them as a single test, we use pre-commit. It can serve as both a pre-commit hook and a command line tool. Pre-commit also provides a configuration file in which we can define and fine-tune each test. Storing this file next to the code assures that all SCA tests are always run similarly.

Unit/Integration testing

As mentioned, a unit in Terraform is a module. But since a module has a broader meaning in Terraform, for unit tests, we treat both examples (the so-called root modules) and the actual modules as units that should be tested. We do not deploy them, however. Unit testing, in our case, is limited to terraform validate — a built-in command typically considered an SCA tool. Code validation in terms of reusable modules is an SCA, but in terms of examples, it also provides some sort of integration tests.

System testing

These types of tests are only run on examples. For system tests, we still do not deploy any infrastructure. System tests are done by running the terraform plan command. You could think of it as a dry run of the whole infrastructure deployment. This means that the code as a whole is checked. To perform this type of test, you already need access to the cloud of your choice.

End-to-end testing

Finally, we do a deployment. But end-to-end, in our case, is not only deploying infrastructure. Since we test IaC, it’s also about testing the idempotence of the code and the ability to destroy the components when needed. Hence this test consists of three tests:

  • terraform apply — to deploy the actual infrastructure, followed by
  • terraform plan — to check idempotence — components are deployed, no more changes should be done.
  • terraform destroy — to destroy the infrastructure. This test is quite important as it shows possible problems with module dependencies. Quite often the creation of resources that depend on each other is asynchronous (you can create resources at the same time and then bind them together later). But deletion is not. Destroying infrastructure can reveal code where we didn’t treat that dependency with special care.

Embrace Automation for Testing

We’ve figured out what and how to test. Now, let’s talk about automation. We’ve divided our automation approach into two levels to make things easier.

  1. Semi-manual — as you can see, there are a lot of tools and a lot of tests. The semi-manual level is about providing a wrapper for the tests. This way, you call a test without constructing the command or configuring a tool to run it. To achieve that, we introduced pre-commit for SCA and Makefiles for the rest of the tests. This way, testing the code during development gets simple.
  2. CI workflows — since we have a lot of code to test and against a lot of different versions of Terraform, it makes sense to automate them. Our code is hosted on GitHub, so the obvious choice for automation was GitHub Actions.

Makefiles

The Makefiles are almost identical for each type of module (examples share the same code, modules share their own). For modules (as we run only validation as a unit test), they contain only one target: validate. Examples, however, are more complicated. We would like to run validation, but next to that, there should also be a possibility to plan, apply, destroy, and, before the latter one, test idempotence. For each of these steps, a target is created by running a required tool or a command.

The benefit of adding Makefiles as a wrapper for the tests is that we can call them locally (during development) and in a CI pipeline. And we are sure we always run tests in the same way. This also makes the developer responsible for hardening the tests, figuring out all the corner cases, etc.

One single code change results in dozens of tests to be performed

With Makefiles, the testing gets simpler, but it’s still time-consuming. Imagine a change in a module that is used in all examples. Every example supports 4 different terraform versions, and let’s assume you have 5 examples. This means that you would have to run:

  • 1 SCA test — code gets changed only in the module.
  • 24 unit tests — 1 module + 5 examples times 4 Terraform versions.
  • 20 system and 20 end-to-end tests — 5 examples times 4 Terraform versions for each test.

That is 65 (!) tests to make sure everything works correctly. Assuming each SCA, unit, and system test takes 1 minute, this already gives 45 minutes. If we add to that 20 end-to-end tests, which can run even around 10 minutes each, we end up with 245 minutes. This makes 4 hours for just one change in one module.

And how to overcome that? Automation is the key.

Branching Strategy vs Testing vs Costs

Before we talk about automating tests, we need to plan what to test and when. When working with code repositories, this involves choosing a branching strategy. In our case, the trunk-based strategy was the best choice, as big or breaking changes happen quite rarely. This means we work on branches created from and merged directly to the default branch. Merges are done through Pull Requests. So the obvious choice for running automation would be a Pull Request — a perfect place to test changes introduced to the default branch.

We also release our code regularly. Code releasing is automated with a Continuous Integration workflow — a perfect candidate for running tests.

Probably you are now wondering why to test the code during a release when it was tested already during a PR. Or, can we or should we split the tests between a release and a PR if we can run tests during a release? The answer to the latter question would be ‘yes.’ We can, we should, and the most important reason to do that is the amount of tests to run and the time required to run them.

The last factor we need to think about is costs. Running automation on GitHub public repositories is usually free (please verify with your GitHub plan). Yet, deploying infrastructure to a cloud is not. Deploying unnecessary code will have an impact on our monthly bill. Following the example above, even a small change might trigger a lot of deployments.

Let’s do some calculations taking Azure as a reference cloud. All costs are, of course, estimates and may vary depending on the type of resources you deploy and on your contract. They may change over time:

  • The smallest VM size that corresponds to a VM-300 Firewall is Standard_DS3_v2 — the costs are 0,293 USD / hour.
  • The typical VM size for a Panorama is Standard_D5_v2 which costs 1,17 USD / hour.
  • Let’s assume we would like to deploy every example; this roughly means 11 firewalls and 1 Panorama (common architecture: 2 VMs, dedicated architecture: 4 VMs, dedicated autoscaling: 4 VMs, Panorama: 1 VM, standalone Firewall: 1 VM)

If we sum this up, you will see that a single deployment (just create and delete, no additional tests) costs around 4.4 USD (rounding up). Multiplying it by all supported TF versions (assuming 4) already gives 17.6 USD.

Does this amount seem like a lot? That’s not an easy question. The answer probably depends on your monthly costs. But we should remember that this almost $18 is just for one full test. How many tests will you run during a month? How much infrastructure will you deploy? How often a developer will run deployments manually during a development life cycle? You should take all these factors to estimate the real costs of running IaC tests and decide what, when, and how often to deploy based on them.

Unveiling Workflows in Palo Alto’s Terraform repositories

To address these challenges, we have devised the following solution:

  • Run basic tests locally — during this phase developers have the flexibility to select the specific tests needed based on the current state of development. By running these tests locally, developers can quickly validate their code, ensuring its correctness and functionality before proceeding further.
  • Utilise Makefiles to test your code — it provides a structured approach to defining and executing tests, ensuring thorough coverage. If a specific test is not currently included in the test suite, it is worth investing additional time to add it. You will benefit from it in the future.
  • Do not deploy anything during Pull Requests — in IaC, infrastructure deployment is a natural step during the development process; we do not need to redeploy it during a PR. We do however run unit and system tests on all changed modules and examples that depend on the changed code. We run these tests using each supported Terraform version. Moreover, we run them in parallel (using a feature in GitHub Actions called matrix strategy). This is a great time-saver!
  • Always run SCA tests — for two reasons. Firstly, in case someone did not run them locally, and secondly, in case someone does not have the latest SCA tools installed. This is especially important for security tests, where new tests are constantly added. We have a separate workflow that makes sure all SCA tools are always running in the latest version (by updating the pre-commit configuration file).
  • Deploy only during releases with a single and the latest Terraform version. The main focus of the deployment is the Cloud API and Terraform provider code rather than the Terraform code itself. If any option used in the code would not be compatible with any of the supported TF versions, we would find that during system tests (PR). On the other hand, we need to test the actual deployability of the code before it gets released.

Taking all these factors into consideration, we have come up with two workflows:

  1. Pull Requests CI — run when a PR is created or updated. It runs only when changes are in the Terraform code (.tf and .tfvars files). This means that any PR that updates, for instance, documentation, does not trigger the tests. And we run the tests only on updated modules and all examples that depend on these modules. Tests are run using all supported Terraform versions.
    For PRs, we run the following: SCA tests, Unit tests, System tests.
  2. Release CI — is executed every week and serves the primary objective of publishing new releases. However, before proceeding with the release, an extensive battery of tests is conducted on each module and example. These tests are specifically performed using the most recent version of Terraform.
    For releases, we run the following: SCA test, Unit test, End-to-end tests.

For safety measures, we rerun SCA and Unit tests during a release. Changes made to the repository (introduced via PRs) are not always related to Terraform code. We update CIs, tests’ configurations, tools versions (including SCA), etc. For these types of changes (as mentioned above), the PR CI is not run. Although we test them before merging into the default branch, these are not automated tests. Also, a trunk-based branching strategy means that the updates reaching the default branch are small. Therefore, the PR tests are usually small. A release is a good place to test the whole code we host.

How did we benefit from this approach?

Still using Azure as a reference cloud.

A complex PR that tests 4 examples, takes 10 minutes from start to finish:

  • running SCA tests (Checkov, linter, terraform fmt) — 3 tests,
  • additionally making sure that the documentation is up to date with the code — 1 test,
  • running unit tests (validation against 4 Terraform versions) — 16 tests,
  • running system tests (terraform plan also against 4 versions) — 16 tests.

This is 36 test! If we still assume 1 minute for each test, this would give us 36 minutes when run manually.

A release, where we run all tests against the latest Terraform version, takes around 27 minutes:

  • SCA — like for a PR, 3 tests,
  • documentation — 1 test,
  • unit — 16 tests: 11 modules + 5 examples
  • end-to-end — 5 tests.

This is 25 tests! Still assuming 1 minute for SCA and unit tests and 10 minutes for end-to-end, this would give us around 1 hour 10 minutes when run manually.

Elevating IaC Testing: Room for Improvement

Indeed, there is more to explore when it comes to testing Infrastructure as Code (IaC). In addition to the aspects discussed earlier, there are several other essential considerations in the realm of IaC testing. The most important one would be Terratest.

Currently, our testing approach primarily revolves around leveraging Terraform itself. However, there are dedicated tools available that are specifically designed for testing Terraform code. One such tool that stands out is Terratest. While working with Terratest does require some basic knowledge of Golang, it offers enhanced flexibility and enables us to conduct more detailed and comprehensive tests. By utilising Terratest, we can further strengthen the quality assurance of our Infrastructure as Code deployments and gain deeper insights into the behaviour and performance of our infrastructure. By using Terratest, we can, for example:

  • Test module’s contract — inputs and outputs. This can be considered a form of integration testing, where we ensure that the module’s dependencies and interactions are functioning as expected. By thoroughly testing the inputs and outputs of the module, we can verify that it behaves correctly and consistently within the broader system.
  • Test module’s behaviour when introducing changes to the code — this type of testing falls somewhere between a unit test and a system test, allowing us to perform isolated deployments of the module itself. By specifically focusing on the module and its interactions within the infrastructure, we can ensure that any code modifications or updates have the intended impact without affecting the broader system.
  • Run real end-to-end tests — these tests involve the actual deployment of the NGFW infrastructure and the execution of real traffic to validate the proper configuration of all related network resources. By simulating real-world scenarios and verifying the behaviour of the deployed infrastructure, we can confidently assess the effectiveness and accuracy of our NGFW modules.

--

--