Infrastructure Policy in Your Application Pipeline

Published in

HashiCorp Solutions Engineering Blog

9 min readMay 20, 2020

Automating the software delivery process is arguably one of the most important things an IT organization can do for its customers. Nowadays, there are many tools we can use in the build pipeline not only to ensure our code is high quality (performant, fully-tested, safe, debugged, well-written, etc.), but also to have a complete, end-to-end process.

With the advent of cloud computing and infrastructure as code, software architecture has changed to include the infrastructure as part of the application or service. This brings a whole new level of convenience, high availability, and resiliency when rolling out updates.

However, when we bring infrastructure provisioning into our pipeline, we potentially introduce new challenges which are not just technology-dependent.

The Way Things Were

Complex build processes aren’t a new thing. We’ve been packaging software for decades with various compile, test, and bundle steps. Of course, what makes today’s build pipelines different is that we have the ability to push code changes/additions out to the end-user in as little as a few minutes. But just because we have the ability, it doesn’t mean we can, or even should (gasp!).

Let’s back up just a little. There’s a lot of interesting history as to how we got where we are today, but let’s just touch on more recent history — back when we had racks full of hardware.

We’d order some nice racked-mounted servers, wire them in, set them in to our VM environment, configure them, spin up a VM, install an operating system and all the agents needed on the box, and so on. Sounds pretty simple, but some of you who work in large enterprises know that by the time your burn-in phase is complete, that brand-new server is already 6 months old.

Because of that, we’d cram as many applications on to the same VM as possible so as to avoid needing new hardware in the near future. Of course, there’d be the occasional app which would spin out and take down the entire machine with it — co-located apps and all. But we felt okay about it because there was a balance to be had. Oh, how our customers felt differently, but at the time, we didn’t think we had much of a choice.

The Way Things Changed

Customers demanded stability and frequent enhancements. Our monolithic applications weren’t conducive to speedy delivery, so we decided to break up our application(s) into small, single-purpose services, which could be built, tested, and deployed in its own cadence. And with the cloud providers gaining in popularity, there was now a possibility of provisioning infrastructure along with the microservice.

Then along came the age of orchestration, with tools like Nomad, Kubernetes, Mesos, etc. These tools helped us make more sense of our application architecture, but there were also infrastructure challenges. This application design just didn’t fit into our traditional data centers.

Because many companies were able to link increased revenue directly to faster software delivery, there was no going back to the old way. Instead we needed to figure out a new way. Hence, the birth of the modern CI/CD build pipeline.

A Quick Reflection

Okay, I should acknowledge that everything mentioned so far may not actually be history for some readers. Technology can get stuck in time at organizations for a number of reasons, but security concerns are almost always at the top of that list. There’s no magic bullet since every organization is different, but with security being the main focus, this article aims to provide you with additional tools and concepts in order to securely modernize your software delivery process.

The Way Things Are

Security teams are front and center these days with everything IT. While it may seem to slow down certain processes, I really wouldn’t want it any other way. Many of us have been a victim of a data breach at some point. And that’s why our security teams have a vested interest in making sure applications are safe prior to delivery — a function previously fully owned by the developers.

CI/CD pipelines have solved a number of challenges for us, and with heavy focus on reducing risk, security teams today often own at least a portion of the CI/CD pipeline — ensuring vulnerability scans occur for both code and its runtime.

With all these great advancements in automation, why is it that we still can’t fully automate end-to-end?

Provisioning: it seems to be a common roadblock. It’s not that we don’t have the ability. It’s that we need to guarantee infrastructure is provisioned in such a way that it doesn’t introduce risk.

End-to-End Automation

Enter Terraform Cloud with Sentinel. Hopefully most readers are familiar with Terraform. If not, or if you’re still using proprietary provisioning tools provided by cloud vendors, then you may want to watch this short video about Terraform before continuing.

Terraform Cloud builds on top of the open source version by adding robust features like workspace management, private module registry, Sentinel (policy as code), and a number of other features. With Terraform, you can write out your infrastructure in HCL (Hashicorp Configuration Language, an easy-to-use provisioning language) and provision to more than 200 providers. But the Terraform code itself does not check to see if you’re following policies set by your organization.

Sentinel allows security and infrastructure teams to write your policies as code and enforce them prior to provisioning. Trying to get out in front of cloud spend? Sentinel can let you set monetary limits for workspaces. Want to limit which resources teams are allowed to provision? With Sentinel, you can provide an allowed list of resources and prohibit the creation of anything else. Or explicitly prohibit the creation of any network resource, for instance. When you combine Sentinel with the private module registry, you can standardize your infrastructure with modules and use Sentinel to enforce their use.

If our software engineering teams comply with security requirements, and Sentinel can provide guaranteed enforcement, then we can limit security reviews for our Terraform configurations and completely — and safely — automate our software delivery end-to-end.

Sentinel provides three levels of enforcement:

Advisory — The run does not fail, but can help notify about best practices
Soft Mandatory — The run fails, but can be overridden
Hard Mandatory — The run fails and cannot be overridden

Sentinel has its own simplified programming language designed specifically for policy enforcement providing every detail on a Terraform run and allowing you to act on it.

For automation, since Terraform Cloud isn’t just a binary file you can run, you’ll need some way to interact with it. The good news is that Terraform Cloud has a robust set of RESTful APIs which allows you to manage every action in Terraform — from policies to workspaces to runs and everything in between.

Terraform and Sentinel in Action

Now we’re going to see how this works. I want to invite you to follow along yourself by following the instructions here:

https://github.com/kevincloud/terraform-jenkins-pipeline

In this article, I’m using Jenkins, but you can use any CI/CD tool which supports HTTP requests. And the nice thing about running everything through an automation pipeline is that your developers don’t necessarily need to work out of multiple UIs. Simply drive all approvals through your CI/CD tool. This is what we’ll look at today.

In our scenario, we have two policies:

Ensure the specified machine size is in the approved list (hard mandatory)
Keep the workspace cost under a certain amount (soft mandatory)

During the run, we’re going to see how we can use the API to manage policy overrides as well as confirm applies.

First, let’s kick off a build in Jenkins, then inspect each step in the Jenkinsfile and the respective UIs.

Image of starting the pipeline build in Jenkins

In the real world, a Jenkins build would be kicked off by an external process. For instance, a developer would push their code into a GitHub repository, which would be configured to trigger a build in Jenkins upon a successful merge.

The first three steps of my build are centered around building the app and uploading the artifact, so let’s skip right to the Terraform step. In order to interact with our workspace in TFE, we need to provide our organization and workspace ID. The workspace ID can be obtained from the UI, but it’s much nicer if we can simply use the name of our workspace instead of the ID. So let’s get the ID using the workspace name in order to make both us and the API happy.

Now that I have my workspace ID loaded into a variable, I can kick off a run in TFE. First, I need to construct a payload to send to the API endpoint. Pay close attention to the ${wsid} variable. A HEREDOC is the easiest way to pass this in.

Now we can call the API and pass in the payload in the body of the request.

Hang on to that return value. It’s the run ID, and we’ll need it going forward.

We’ll need to monitor the state of the run in order to determine next steps. If no policies have been violated and the workspace is set to auto apply, then the run will continue to completion and Jenkins will report a successful run.

But we need to make sure no policy has been violated, so we’re going to watch for it, specifically policy_override and policy_checked. A full list of statuses can be found on the Admin Runs API page (not to be confused with the API you’d use, which is the Runs API — the status list is not on the Runs API page). Notice the runid is passed in and used to continually monitor the run.

It’s important to keep in mind that we cannot alter how Sentinel is run during this process. The security team is guaranteed success because Sentinel is not a step in the build pipeline, it’s built in to the Terraform run and is enforced out of reach of development teams. We can, however, respond to prompts using the API, which would otherwise be managed through the UI.

The policy will fail, that’s okay, it’s part of the exercise. We’ll have Jenkins prompt us for what action to take next. If you’re running this in your own environment, you can activate the prompt by hovering your mouse over the Run Terraform build step when the dotted outline appears.

You can also customize the message that Sentinel sends back if needed. But what’s happening in TFE while we see this prompt in Jenkins? It’s actually following the same path our Jenkins pipeline is.

Since we’re driving all user interaction through Jenkins, we’ll just check the Override box, then click Continue. Our API call is shown in the following code snippet. Pay close attention to the policyid we’re passing in. This was obtained from the payload we received from the status request.

This TFE workspace is not configured to automatically apply, so we need to watch for the confirmation status as well and respond accordingly.

Once again, let’s take a look in the TFE UI and see what an administrator might see:

When we’re ready to confirm, we simply call the API again and continue watching for completion. We may have a successful run, or there could be an error. In either case, our pipeline is savvy enough to inform us.

The pipeline will continue in fashion. Taking one more look at this process in the TFE UI reveals the following. Notice how we can see the live log feed.

Congratulations! We can see success in both the TFE UI and the Jenkins UI.

Conclusion

I think we sometimes equate automation with risk, but it doesn’t have to be. In fact, it’s often the human element that introduces risk in the pipeline, whether from faulty code or poorly designed infrastructure. So rather than checking to see if your infrastructure is safe, let Terraform Cloud and Sentinel help you check to see if it’s designed safely — before it becomes infrastructure. By enforcing compliance, we can safely automate our software delivery and increase the speed at which we provide our customers with new features and enhancements.

Happy coding!