Box Tech Blog
Published in

Box Tech Blog

Box CMF: Shift Left Testing, Infrastructure as Code

Co-Authors: Xaviea Bell, Matt Bowes, Raul Flores, Jared Newell, Quynh Tillman

Illustrated by Madeline Horwath/ Art Directed by Erin Ruvalcaba Grogan

So, you have completed the definition of the foundational elements of your Infrastructure as Code (IaC) framework, now what? Well, it is critical that you also focus on developing a clear framework for ensuring IaC developers focus on testing the same way any software developer would. Preventing infrastructure deployment bugs and misconfigurations that can easily lead to security vulnerabilities is just as essential as ensuring the quality of the application software that runs on that infrastructure. This is often an overlooked area, but is a key focus of how Box is approaching our overall migration into the Public Cloud.

This blog will explore Shift Left Testing methodologies applied to IaC. Specifically, we’ll focus on the following:

  • Static analysis testing via linting and policy enforcement
  • Component Testing of custom terraform modules
  • End to End testing

Why Shift Left?

Similar to the traditional Software Development LifeCycle (SDLC) process, we wanted to ensure a secure infrastructure environment and to provide feedback as early as possible in our Infrastructure Development LifeCycle process. We have continuous monitoring tools that monitor our live environments (i.e. post infrastructure deployment) for security vulnerabilities, performance issues, and drift. We also implement required peer reviews as part of normal infrastructure deployment processes. However, there was no standard framework to enable infrastructure developers to define tests that can be run prior to the deployment of infrastructure. This limited our ability to catch issues early in the infrastructure lifecycle process.

We defined a number of security checks, enforced via GCP organization policies. They guide what can and can not be set in our environment. The problem comes when we have a team submit IaC code, it terraform plans successfully and then fails to deploy because an organization policy does not allow it. This is not the best experience for development teams, as they should have been notified when they submitted the code that it did not meet our infrastructure deployment requirements. In addition, organization polices are limited to the constraints that GCP provides, we also need a way to prevent issues that organization policies do not cover. In essence, we want to move that feedback to earlier in the development process and thereby “Shift Left” the infrastructure validation process.

IaC Terminology and definition of Static, Component, and End 2 End Tests

The following diagram illustrates how we think about the various test categories at Box:

Everything starts with a pull request and depending on what kind of change is being made the corresponding test pipeline is executed. In the IAC repos, the static analysis stage is covered by our linter and implementation of Conftests. This helps us ensure the code meets our defined standards and that the changes don’t violate any policies. If the change is in the modules repo, the tests are different. We test these changes as individual components and then confirm they will integrate well with component testing. In either case, the pull request (PR) is not allowed to be merged if any of these tests fail. Once the tests are passing and the PR has been approved by an authorized peer, it can be merged which triggers the next sequence of tests.

In the Post Merge phase, only IAC changes require validation in the lower environments which allows us to validate functionality before proceeding to roll the change out to production. Changes in the Terraform Modules don’t go through this cycle as they are already expected to function in all environments based on the component testing results. Deployed changes aren’t forgotten in our model, we make sure to keep a vigilant eye on the deployed infrastructure and have ongoing drift detection as well as security scanning.

Tool Evaluation and Selection

Before we dive into the details of each of these areas, it’s important to provide some perspective on how we approached this effort at Box. One of the most important aspects of deciding how to create a framework to support Shift Left Testing is tool evaluation and selection. We categorized the tools into each of the three test areas we wanted to define our framework. Given the number of tools to choose from, we also decided to prioritize these tools into high, medium, and low buckets in terms of whether we would evaluate and potentially implement a tool in a given timeframe. High priority tools are where we placed the most effort for our evaluation effort.

Static Analysis via Policy enforcement tools

The static analysis tools are split into 2 buckets: linting and policy enforcement tools. The following tools were discussed and prioritized:

High Priority

Medium Priority

Low Priority

Ultimately, we converged on terraform validate and OPA as our primary linting and policy enforcement tools. The key reason for selecting these tools was the simplicity of integrating these tools in our existing IaC technology stack. It’s important to note that OPA was not the first choice in tools for policy enforcement. We initially selected Terraform Compliance as our key policy enforcement tool. One of the key factors was its support for Behavior Driven Design (BDD) style testing (using radish), which is currently a big focus of our SDLC process. However, this did require us to add additional custom Atlantis workflows to support this capability.

Part of our ongoing effort with our IaC Framework is to continuously evaluate new frameworks and tools. Since other parts of our organization were using OPA, we eventually decided to go back and take a deeper look this tool. As they say in life, “timing is everything!” It turns out that once we went back to evaluate OPA, Atlantis integrated native support with a utility call Conftest, which enables the ability to write policy test against configuration data (using OPA). So, we were able to simplify the support for policy enforcement and remove those custom Atlantis workflows (required for Terraform Compliance) and leverage Conftest.

Sentinel is also a very popular policy enforcement tool that we could have chosen to integrate into our environment. We chose not to pursue the use of Sentinel for a few key reasons, one we already had made our decision on a secure execution environment tool called Atlantis, we will discuss this tool later in the blog. Since Sentinel is part of Terraform Enterprise, which also offers a secure execution environment as one of its primary benefits, there was obvious overlap here. The other key reason is that we had other use cases within Box that had already started using OPA with very good results. As with any technology stack evaluation, there will be some tough decisions to be made about what tools should be included and which ones should be left out. There are no right or wrong decisions here, but just make sure you do due diligence and evaluate the tools based off of well-defined requirements and most importantly, what works best in your environment.

Open Policy Agent (OPA) allows us to test IaC when it is submitted to our Atlantis pipeline. When a user creates a PR with IaC code Atlantis we will terraform plan that code, check the plan against a set of OPA policies and if there is a violation, it will block that code from being applied. Teams now get feedback earlier in the process that their code does not meet pre-defined infrastructure requirements. Since OPA policies are created by us and can be customized, we can create policies that that we were not able to block with organization policies. Since we do not allow users to create resources outside of the IaC pipeline (i.e. via the Cloud Console) we can now be sure that the resources that are created are meeting our standards.

Component and End 2 End tools

The component test tools are focused on verifying the behavior of Box custom terraform root module, submodules, and example modules used to deploy and manage our infrastructure. The end 2 end tests are focused on validation of infrastructure after it has been deployed into our Public Cloud environment.

We evaluated the following tools:

We converged on Kitchen-Terraform and Jenkins as our primary component and end 2 end testing tools. The complexity of developing infrastructure tests with pytest and the limitations of Terratest were a few of the key reasons for selecting these tools as our component testing technologies. The reality here is that we also did not preclude ourselves from using pytest or Terratest (or any other tools for that matter) in the future. Since Jenkins is really just an execution environment for our tests, we can technically use any tool we want to support how we develop both component and end 2 end test cases.

As we will discuss in more detail later in this blog, the end 2 end tests were more about providing a secure and automated execution environment, using Jenkins. Although github actions is a viable choice, Box has long standardized around Jenkins, so this was actually an easy choice. As for specific tools to develop tests, we agreed that we would not require specific tools to develop end 2 end test cases. This will provide ultimate flexibility to our developers to implement tests with whatever tools they believe work best to validate their infrastructure deployments. Our initial plan was to restrict the tools to a pre-defined set, but there were no reasonable options at the time of our investigation, so rather than limit developers choices, we opted to allow more flexibility in this area. We may eventually, pair the list down, but for now, choice is more important.

With choice and flexibility as the guiding force behind end 2 end testing we set out to develop a standardized mechanism for bringing your tests into the framework. We decided to implement a generic configuration file (JSON) that would allow developers to abstract away all the details of their testing and provide a simple set of instructions that the end 2 end testing pipeline would consume and execute on. With only 6 required parameters we were careful not to make the configuration too complex or burdensome so that the majority of the time could be spent writing the actual tests.

The results of the end 2 end tests are required to be formatted in one of cucumber json, junit, or testNG since we expect the results to be consumed into our Jira XRAY system. Here is where the real power of end 2 end testing is realized as we have the visibility to determine how things are progressing.

Framework Definitions

Now that we have selected the initial set of test tools, we will describe how we integrated these tools into our overall IAC Pipeline model.

Static Analysis Testing

Conftest Policies and Atlantis

Early in the design phase of the IAC Pipeline we had determined that we would need a way to validate and test code as it was being put up for review and before it made it out into the wild. Atlantis at this point in time did not offer a native way of supporting this and we spent a non trivial amount of time exploring in house solutions.

Linting was the first area we tackled and we implemented a custom github check in Jenkins that would clone each IAC pull request, and run the following lint tests:

  • Terraform fmt check
  • Terragrunt hclfmt check
  • Terraform validate check
  • Terraform doc check

These tests produce a pass/fail result depending on the results. This provided the ability to govern the quality of IAC code that was being merged early on, but we knew that we needed a more sophisticated approach to account for more granular enforcement of standards.

It turns out that the community of Atlantis users had grown since our initial deployment and they too were running into very similar needs around static analysis and testing. It seemed like almost overnight Atlantis announced that they had introduced support for server side Conftest Policy checking. Conftest exists as an independent utility but does rely on the Rego language from OPA (assume it has been defined already by this point) which means that we had our work cut out learning a new framework and language. After some trial and error and spending entirely too much time in the Rego Playground we were able to come up with an initial set of policies. In our initial deployment we wanted to focus on enforcing the following requirements:

  • Ensure project names adhered to our internal naming structure.
  • Ensure network firewalls did not get deployed with wide open directives.
  • Ensure projects were labeled appropriately and in accordance with our internal policies.
  • Ensure new projects did not enable auto_create_networks by default.
  • Ensure least privileges in IAM bindings
  • Ensure all GCS buckets use Uniform Access levels
  • Perform CIS checks on new resources

Since then we have empowered our users to develop their own policies as needed and have begun to see rapid adoption as new policies are written and introduced in the pipeline ensuring tighter quality control over our IAC.

Testing OPA Policies

Rego Playground was a good way to to experiment with OPA at the beginning but we needed a better way to develop and test policies. VSCode has an Open Policy Agent extension that gives us the ability to run OPA tests in our IDE. Once the extension is installed we can now perform tests directly in the IDE.

The following basic process was defined to test IaC code in terrafrom:

  • Create a Terraform plan and convert it to json. An easy way to do this is run the following command
  • terraform plan — out tfplan.binary && terraform show -json tfplan.binary > input.json

You can inspect this json file to understand the output that terraform is presenting so that you understand which fields to check for in OPA. Different Terraform actions have different results. You need to account for more than just creating new resources, fields and locations in the plan change when resources are updated.

  • Create a OPA policy to enforce your standards
  • In VSCode select, “OPA: Evaluate Package “ form command pallet, this will call the OPA extension and test the new policy against the input.json file
  • Refine OPA policy as needed.

When creating a new policy we initially deploy it in `warn` mode for two weeks. We monitor that the policy is behaving correctly. After the initial warn test we change the policy to `deny`. Once in `deny` mode the IaC code must pass the policy check to be deployed.

Component Testing of custom terraform modules

The Component Tests have a similar setup to the tests in terraform-google-vm/test. The Component tests are used to verify behavior of the root module, and submodules. Additions, changes, and fixes are accompanied with tests to ensure they adhere with the security requirements. Component tests are run using kitchen-terraform and InSpec. These tools are packaged within docker image for convenience. kitchen-terraform provides a set of Kitchen plugins which enable the use of Kitchen to process a terraform configuration and verify the resulting infrastructure with InSpec controls. The tests are hosted inside the test directory which is structured as follows:

  1. .kitchen.yml: It defines InSpec tests that will be run. It is using Kitchen-Terraform, which is an implementation of InSpec. This is where suites of tests to run are defined. Each suite needs to specify the drivers (gems that are installed on containers).
  2. fixtures: Contains a number of directories with terraform to be applied and initiate the tests.
    The property “suites -> driver -> root_module_directory” indicates where the terraform code is located.
  3. integration: Contains a number of directories with Kitchen tests to be executed. The property “suites -> name” indicates where the Kitchen tests are located. The directory convention for these tests is as follows:
    controls: This directory contains the ruby (.irb) files with the tests to be executed after the fixtures have been executed.
    inspec.yml: This file contains the attributes to be passed into the tests. These attributes usually match the output values defined in the files of the terraform apply for the fixtures.
  4. setup: This directory contains the terraform code that will configure a GCP project, which will be used as the target project for the terraform in the fixtures.

There are a number of properties that can be configured in the .kitchen.yml file. These properties are documented in the Kitchen Ruby Docs. There are three steps to terraform component tests:

  1. Environment Setup
    – Setup includes initiating credentials, initiating terraform, and running terraform apply to create the GCP ephemeral project.
  2. Running Component Tests
    – This step deploys the underlying resources as described in component step, and runs the ruby tests against the deployed resources.
  3. Environment Teardown
    – This step tears down the environment once testing is complete.

Here is an example of a specific issue we encountered and fixed while developing blue-green and canary support with GCE. We initially tried to use google’s example: terraform-google-modules/test. However, we needed to test our internal modules like blue-green and canary that involved multi-staged testing. We uses helper function to update terraform variable files between consecutive kitchen runs.

End to End Testing

The End to End (E2E) Test Framework for IAC is intended to provide a generic mechanism for “doing something” after infrastructure has been deployed by the IAC pipeline (Atlantis). All previous test methodologies focused on testing before merge to either a module library or a terraform repo, which makes our end to end test framework unique that it is testing the infrastructure after deployment. “Doing something” is extremely broad and the only strict requirements are that the “something” be Dockerized and follow our interface guidelines so that the result of the test can be understood and reported on. E2E tests are invoked by a webhook from GitHub when a PR has been closed via a merge in service owner’s IAC repo.

At a high level the steps that service owners need to complete to add end to end testing to their terraform repository are:

  • Define a test suite/methodology for their service.
  • Create the test scripts/binaries in accordance with their methodology.
  • Dockerize the test suite and push the image to Google Container Registry (GCR).
  • Add the tests to their IAC repo via the e2e.json config (See below).
  • Configure the output of their tests to adhere to our interface standards so that the test framework can understand if the test was successful or not.

The following image is an example of how a service owner declares to the pipeline which End to End tests to run (their e2e.json config).

There are some key arguments that we want to point out, specifically the inputs and outputs. Service Owners can specify which “args” and “command” they would like to execute on instance start (just like Docker), which ultimately determines what this temporary container will execute / test. The results file specifies where the pipeline needs to look for the results of the tests which this container executed. Together, these are the elements needed to link the IAC Testing Framework (owned by a central devops team) to the service-owned docker test containers (owned by the users of the pipeline). The `testFrameworkType` param lets the pipeline understand the output format when reading the results file.

In this example, once tests have executed the pipeline takes the results and uploads them to Jira XRAY which tracks these tests historically for service owners. The Jira related arguments above are used for telling the framework where to send the results in Jira (jiraProjectKey, jiraLabels, etc) as well as the output of the results file (test.xml above).


As should realize from reading this blog, IAC testing involves a number of key areas that your team needs to decide on, before being able to realize the full value of Shift Left Methodology. Keep in mind that there is no one size fits all approach to the tools that you can use in your IAC toolbox. The decisions on which tools to deploy in your IAC testing framework should be based on clear requirements on how your team wants to approach testing of IAC.

We wanted to share our journey thus far in this space, so that it can provide some insights to those who may be on similar journeys. Let us know your thoughts in the comments section of this blog. We are always looking for feedback and ways to improve our overall IAC Framework.

The next blog post in this series is Box CMF: DevOps Reporting, Infrastructure as Code. This blog will focus on how we continuously measure our IAC Framework to ensure we have the necessary, data-driven, insights to understand what is going well and more importantly what needs to improve.

Interested in learning more about Box? We are hiring. Checkout our careers page!



Stories, projects and more from Box Engineering

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store