Auditing Terraform for Compliance Checklist

Auditing Terraform Open Source for Security, Governance, Risk & Compliance

Michael Fonseca
HashiCorp Solutions Engineering Blog

--

With the skyrocketing usage of Terraform open source (OSS) within provisioning and CI/CD pipeline workflows, organizations are potentially exposing themselves to security risks affecting Corporate Security, and Governance, Risk, and Compliance (GRC) Standards.

Terraform OSS is a very powerful tool that is often quickly adopted by technical professionals to manage infrastructure provisioning both in the cloud and on-premises. These individuals can then streamline infrastructure provisioning, driving reduced time to market for technology solutions and increasing customer satisfaction.

In this effort to automate and deliver value to the business, it is too easy to skimp on detailed security requirements. It is important to note that there are some standard Terraform security recommendations such as storing TF state files in encrypted object storage, but in this article, I will outline additional steps enterprises must take to meet their security standards.

HashiCorp has learned a lot from our customers and 3rd party auditors as it becomes more of a business critical application and requires auditing and security in line with the rest of the Enterprise Risk & Business Continuity programs. In this context, we want to up-level Security & GRC teams by highlighting specific audit insights and outlining how Terraform should be audited for compliance, ultimately saving organizations from the risk of implementing non-complaint workflows or remediating them down the road.

It is important to note that the following checklist only applies to open source Terraform implementations. In both the managed Terraform Cloud and self-hosted Enterprise offerings, HashiCorp incorporated findings from our customers and 3rd party auditors to lay a security-first foundation for our customers, simplifying security and requiring a much smaller checklist than below (since more is covered by those products). The products’ security principles and architecture are outlined in their respective Terraform Cloud & Enterprise docs. In addition, Terraform Cloud & Enterprise products have internal Security & 3rd Party certifications.

Terraform Open Source Audit Checklist:

This reference is the start of your Terraform OSS compliance efforts. This is not an exhaustive list, however, and there are likely organization-specific requirements for you to consider.

Prerequisite: Security Guardrail Risks & Controls

Prior to reviewing the Risks within this section, it is important for auditors to understand the role of Policy as code within the infrastructure provisioning lifecycle. Policy as code is the use of code to define and manage rules and conditions within the infrastructure provisioning process. Policy as code can be defined in multiple programming languages and is dependent on the Policy engine that is being used. For the purposes of this article, we refer to HashiCorp Sentinel as the Policy as Code language and engine as the reference example.

HashiCorp Sentinel (i.e. policy as code) is executed in between the Plan and Apply phase of Terraform. The policy engine evaluates all of the Terraform code against the policies and conditions established within the test.

Enabling Policy within the provisioning stage reduces the risk of security vulnerabilities in deployed environments. If a vulnerability is deployed then it must be remediated which takes time and effort. In this model, we shift security left to the developer so they inherently develop secure TF code and therefore are deploying secure resources within the guardrails provided by the organization and improve developer and application deployment velocity by avoiding post deployment remediation tasks.

As depicted below, within the Terraform Cloud workflow, multiple checks for Cost Estimation, Sentinel Policies, and Run Tasks are evaluated between the Plan and Apply phase to ensure compliance prior to provisioning.

Terraform Cloud Compliance Workflow

Below are some example policies our users deploy:

For more examples Terraform Sentinel docs, and the HashiCorp Sentinel Library.

Security Guardrail Risks & Controls

Risks (R1)

R1.1. What policy-as-code guardrails are in-place to conduct static and dynamic TF code analysis before provisioning? Note: For persons not familiar with Policy as Code for Infrastructure Provisioning, the following is a brief summary with examples.

R1.1.1. What prevents users from accidentally provisioning non-compliant resources?

R1.1.2. What is the risk of provisioning non-compliant resources such as un-encrypted resources or unrestricted network access?

R1.1.3. If non-compliant resources are provisioning how is that remediated, and how do you ensure that it does not happen again?

R1.1.4. If/how are costs controlled in the provisioning phase?

R1.1.5. Are certain TF code use cases tested today?

R1.1.6. Are there known security gaps in your TF code testing pipeline?

R1.1.7. Historically, have gaps been easily identified and addressed with existing tooling and/or processes?

R1.1.8. Are there limits on scaling TF to other teams due to any known security limitations?

R1.1.9. Are any provisioning bottlenecks caused by manual security reviews, separation of duties, and/or inefficiencies that can be addressed by the repeatability of policy as code?

R1.1.10. Would your existing security practices allow you to implement self-service provisioning processes?

Controls (C1)

C1.1. All Terraform Code (HCL) & Terraform operational execution is analyzed for security & operational governance controls. For example, all resources are: encrypted, approved for provisioning, ingress & egress rules are compliant, valid module versions are enforced, Terraform versions are enforced, Identity & Access Management (IAM) roles are enforced, etc. There can be hundreds of Security & GRC rules that should be applied to the provisioning process per Corporate guidelines.

C1.2. Policy as Code as defined above in reference to HashiCorp Sentinel, there are also other functions available within Terraform Cloud/Enterprise products such as Run Tasks to provide Policy as Code review from 3rd-party systems such as Snyk, Bridgecrew, and HCP Packer, along with standalone alternatives such as Open Policy Agent.

C1.3. HCL code testing tools for unit and integration testing such as Terratest, Kitchen-Terraform, or Terraform compliance. There are multiple suites available as Community additions.

C1.4. HCL & Terraform operational analysis — this control refers to implementing an overall practice of assessing operational risks within the provisioning process and implementing controls. For example, when operating TF OSS, it is common to have operational issues/incidents due to manual processes such as incorrectly adding variables to TF OSS provisioning runs and/or incorrectly pushing changes meant for a non-production environment to a production environment. Operational controls that ensure the correct changes are aligned to the correct operational environment and/or production changes can only happen within certain time windows. And changes need approval or a segregation of duties for change execution.

C1.5. A well defined DevSecOps process should be defined to ensure that all security controls are implemented and people, processes, and technologies are aligned, trained, and enforced. For example, when using TF OSS (and if Policy as Code is not defined/implemented), there should be procedures/gatekeeping events to ensure that DevSecOps (or defined personnel) review certain provisioning code changes that meet a defined risk criteria. Risk criteria can be something like all changes to production environments are reviewed by a DevSecOps Engineer prior to executing the provisioning (Note: manual processes create bottlenecks and slow down the delivery process, at all times your organization should be looking towards automation such as Policy as Code). It is important to work with the technology stakeholders to define this risk criteria. If your organization has implemented Policy as Code (e.g. HashiCorp Sentinel) then the DevSecOps team should be involved with implementing and managing this risk criteria within policies.

C1.6. Terraform Cloud/Enterprise implements many of these suggested controls out of the box, providing Sentinel for Policy as Code, Cost Estimation, workspace locking, variable management and encryption, module usage enforcement and management, and Role Based Access Controls (RBAC) controls.

Data Access Risks & Controls

Risks (R2)

R2.1. Are all state files encrypted?

R2.2. What policies are in place to control human and machine access to TF OSS state files?

R2.2.1. Does every state files (across all clouds) have IAM controls to ensure only authorized personnel has access to state?

R2.2.2. Does every state file (across all clouds) have IAM controls to ensure only authorized systems such as other TF state and/or external systems have proper access to state?

R2.3. Are all functions executed by TF OSS auditable?

R2.3.1. Are all actions taken by users and systems tracked in detail and stored in a 3rd Party for analysis & reporting purposes?

R2.3.2. How long is audit data retained today? Is that sufficient to identify advanced persistent threats?

R2.3.3. Are there automated notifications to users/systems for actions taken by TF OSS?

Controls (C2)

C2.1. Ensure all TF state files are encrypted at rest and in transit

C2.2. Ensure rules of Least Privilege are assessed and implemented. For example, each TF state file has IAM)/RBAC controls specific to the security & risk profile of state file usage. Only Users, Groups, and Machines should have restricted and tightly-scoped access to only the state files that they need. This includes ensuring that access to retrieving state files, and querying outputs and data sources across state files are restricted and scoped to authorized accessors.

C2.3. Terraform Cloud/Enterprise provides encrypted state file management, RBAC controls, API access roles, SSO, and platform notifications via email, Slack, webhooks, and more.

Secrets & Credential Management Risks & Controls

Risks (R3)

R3.1. How are secrets & provider credentials managed for TF OSS? Note: Secrets & Credentials can be any confidential information required to access and/or support required technology components/services within the Terraform provisioning workflow (example: keys, tokens, passwords, etc.)

R3.1.1. Are secrets & credentials managed consistently or at all?

R3.1.2. By users?

R3.1.3. By a 3rd-party system?

R3.1.4. Are secrets/credentials in the state file? Note: Secrets/Credentials (e.g Cloud Service Provider credentials such as AWS Secrets Key or Azure Service Principal) defined in Terraform code will be written to the TF state file and can be accessed unless otherwise addressed by a recommended control.

R3.1.5. How are sensitive variables secured, are they encrypted?

R3.1.6. Are secrets encrypted at rest and/or dynamically generated as required? How is that done?

R3.2. How are security remediations enforced and remediated (is there a well documented process)? This question is of significant importance due to the model in which Terraform maintains state. For an explanation of the impact, if security issues are not identified and fixed prior to provisioning then, the incident is provisioned and then requires remediation. Many organizations maintain an event-driven policy remediation strategy where the security issue is immediately fixed/rolled back on the Cloud Provider Platform (CSP). If this occurs it creates several issues, one being the management of a feedback mechanism to the Developer to notify them of this remediation and the other is the fixing of Terraform code and/or Terraform state to now match the event-driven policy control on the CSP. These types of fixes can be complex and time consuming if not addressed with something like Sentinel in the pre-provisioning phase.

R3.3. How are Security, Audit, and Compliance groups provided access to the TF OSS control and data planes for inspection, remediation, and auditability? For example:

R3.3.1. Are RBAC controls transparent and easily reportable?

R3.3.2. Is there a full audit trail of all changes and intersections with TF OSS throughout the life cycle of the cloud-managed resources?

Controls (C3)

C3.1. Ensure secrets are not being written to TF state files

C3.2. Ensure that secrets are being stored in secure locations and removed from personal laptops and/or insecure locations or plugins

C3.3. Ensure all secrets are encrypted

C3.4. Use short-lived credentials bound by a time-to-live (TTL) for automatic expiry

C3.5. Leverage Environment Variables. Environment Variables are not recorded in the Terraform state file which can provide a level of security if Terraform state files are not properly controlled and/or removes an additional attack vector.

C3.6. Leverage secrets management tools such as HashiCorp Vault within CI/CD workflows

C3.7. Terraform Cloud/Enterprise natively encrypts variables (AES-256-GCM), secures state files, generates audit logging for monitoring security, and includes RBAC controls & API access permissions, SSO, and tightly integrates with other HashiCorp products such as HashiCorp Vault & 3rd party systems such as GitHub & GitLab

Business Continuity & Disaster Recovery Risks & Controls

Risks (R4)

R4.1. Does TF OSS manage business-critical resources? If so:

R4.1.1. Is Vendor Support required?

R4.1.2. Has a maximum allowable downtime (aka Recovery Time Objective or RTO) for a pipeline/TF provisioning execution been defined?

R4.1.3. How is the incident and change management process integrated into the workflow (e.g. access, transparency, review, break/fix)?

R4.1.4. Have there been incidents/issues with Terraform OSS that have impacted operational delivery?

R4.1.5. How could past incidents have been avoided?

R4.1.6. Is there a Service Level Agreement (SLA) for Terraform (e.g. TF OSS platform available 99.9% of the time or Terraform Cloud/Enterprise)?

R4.1.7. Will TF OSS become increasingly important to strategic, forward-looking technology delivery? How will you insure the organization against heightened exposure?

R4.1.8. What Disaster Recovery practices for TF OSS are implemented? Do they align with other critical, business-facing systems or processes?

Controls (C4)

C4.1. Perform a risk assessment for TF OSS provisioning practices

C4.2. Manage TF OSS as an Enterprise application and assign governance practices like those existing in other critical business functions

C4.3. Secure financial & human resources to ensure TF OSS is managed as an Enterprise application

C4.4. Implement Disaster Recovery practices and test them against Corporate Standards

C4.5. Implement internal Service Level Agreements (SLA’s)

C4.6. Terraform Cloud/Enterprise is vendor-supported with guaranteed Service Level Agreements (SLAs). Self-managed deployments can take advantage of architectural options like Active-Active mode to minimize operational impact and align to HashiCorp resiliency best practices.

Operations & Change Management Risks & Controls

Risks (R5)

R5.1. What policies are in place to ensure only approved TF modules and versions are used?

R5.2. Is a lack of automated module enforcement creating security risks, bottlenecks or technical debt?

R5.3. What policies are in place to ensure that only approved OS/Container images are allowed to be provisioned?

R5.4. Are there frequent issues with Terraform code versions in use and cross-compatibility?

R5.5. Has the Organization taken on significant technical debt because TF code is not being updated?

R5.6. What is the estimated size of the total technical debt, including human and financial investment and the average time to remediate?

R5.7. Has Terraform Open Source been assessed as part of your Supply Chain Attack risk assessment program?

R5.7.1. Do Developers manage multiple TF OSS binaries on their laptops?

R5.7.2. How are security risks remediated and binaries managed?

R5.8. How are security issues enforced and remediated?

Controls (C5)

C5.1. Invest time, money, and controls in managing all aspects of TF code management to avoid technical debt, security risks and bottlenecks. This usually requires a holistic set of automated processes and program management established around TF OSS

C5.2. Investigate other 3rd Party tools to manage TF code & pipelines

C5.3. Leverage HashiCorp HCP Packer to control/enforce only approved OS/Container images are provisioned

C5.4. Leverage Policy as Code to control/enforce only approved OS/Container images are provisioned

C5.5. Terraform Cloud/Enterprise allows teams to leverage Sentinel for Policy as Code to avoid creating new technical debt, enforce module versions, enforce approved OS/container images, identify security risks and/or bottlenecks, and provide built-in audit logging for change management controls. It also centralizes binary storage and versioning, eliminating the risk of developers hosting disparate, insecure binaries.

Conclusion

Terraform Open Source is a highly valuable starting point in the provisioning automation journey for most organizations. However, there are inherently Enterprise-grade security requirements that companies would need to implement themselves. Auditors “don’t know what they don’t know” and this questionnaire is a starting point for Auditors looking to enable and further secure their organizations and infrastructure teams.

For additional information on HashiCorp Cloud & Enterprise Solutions:

Learn Terraform

Terraform Cloud Security Model

Terraform Enterprise Security Model

Using Terraform to Improve Infrastructure Security Model

Terraform Cloud & Enterprise Sentinel (Policy as Code) Framework

--

--