Practical Security - Infrastructure as Code in a Secure Azure Deployment

Published in

ELCA IT

9 min readJul 25, 2023

How Azure Private Link and Azure DevOps Agents on VMSS are used by the ELCA Data Engineering team to ensure a streamlined, automated, and secured data platform deployment using Terraform. We will focus on strategies to secure the platform and data movements on Azure, thus reducing intruder attack surface and protecting ourselves against leaks.

As businesses increasingly migrate to the Cloud, IT teams are empowered with new technologies that have a positive impact on security, while shifting concerns to new areas. In this article, we will explore some of these new challenges with the need to find the right trade-off between robustness and ease of use.

For instance, Cloud solutions dealing with data usually require additional security measures, as they are expected to receive and deal with sensitive or Personally Identifiable Informations (PII).

As a first layer of security, to better deploy and manage infrastructure in the cloud, companies are now following the Infrastructure as Code (IaC) concept primarily using Terraform. However, the stateful nature of this tool leads to its own problems, as we are faced with a single file describing the whole Cloud infrastructure filled with customer’s private and sensitive informations, stored in plain text.

In this article, we will address these two major concerns: the data movements in a Cloud environment when using a data platform and the Terraform state file security in such an infrastructure.

Keywords: Azure, Azure DevOps, CI/CD, Security, Storage account, Terraform, Terraform state

Data movements

By moving a data platform to the Cloud, the attack surface of the whole solution increases. By default, most resources in Azure have public access: although they are protected by the authentication and authorization layer of Azure, attackers could theoretically reach those (of course Azure also implements other principles like data encryption at rest and secure communication protocol using TLS).

To emphasize this point let’s consider a basic design pattern using a “staging” area.

The “staging” is a temporary storage area where the data arrives from multiple sources in a raw format. The goal is then to ingest the data inside our system and perform transformations on it. For instance, we can load this data into a Databricks or Synapse cluster. This means we’ll have data movement between multiple Azure resources.

As mentioned earlier, this data can be very sensitive for our customers and we want to maintain the highest possible degree of security on those. The fact that Azure resources are publicly accessible by default can be concerning for our customers since they don’t have complete control over the infrastructure. How is the data stored? How is it transferred between resources? Who has access to it?

Even with public access, the security provided by Azure should be enough to protect your data. But in this case, we don’t really have any control on how the requests are performed over the network, and thus only rely on the authentication/authorization mechanism. This dependence on a single security layer is far from optimal and falls short of an ideal solution.

Terraform state file security

To keep track of the deployed resources, Terraform uses a state file that can be stored locally or remotely. Let’s have a look at how to handle this file in the context of Continuous Deployment.

There are three main reasons why you would want to maintain high-security standards regarding your state.

🔒 Data protection: passwords, secrets, SSH/SFTP keys, connection strings, etc.
🏅 Competitive disadvantage: could reveal the company’s strategy to potential competitors
💻 Exposure to cyber threats: with the detailed infra, cybercriminals can design more targeted and efficient attacks

To begin with, the state file should always be stored remotely as it not only allows for collaboration but also helps with security. Terraform calls this a backend and sometimes exploits underlying features of the provided storage such as “file locking” to avoid concurrent infrastructure edits.

Put simply, an infrastructure team working on Terraform would typically store the state file in an Azure Storage Account (blob object) or any equivalent (GCP Bucket, AWS S3, …).

Generally, you can consider that using a remote state alongside an identity/role access control is sufficient because:

Storing state remotely can provide better security. As of Terraform 0.9, Terraform does not persist state to the local disk when remote state is in use, and some backends can be configured to encrypt the state data at rest.

Terraform also doesn’t recommend encrypting the state for the same reason cited above.

Thus the main recommendations are:

Use remote state storage with an identity/role-based access control
Encrypt the data at rest (default feature for most Cloud object stores)
Don’t commit the state file in your repo

Despite these recommendations, it is important to acknowledge that they may not provide absolute protection. We’ll see later in this article that, while we came up with a solution to secure the data movements, we had to implement an additional mechanism that would ultimately enhance the state storage security.

Our optimal solution

The following architecture aims at addressing the two main concerns we talked about in the previous parts:

Data movements
Terraform state file security

The big picture

Overview of the Azure solution featuring a staging area and a data platform (Databricks, Synapse…) managed by Terraform

Here is a simplified schema of the infrastructure we typically use at ELCA. The primary goal is to keep all communications inside a single Azure Virtual Network (VNet) and avoid public internet access. This is made possible by the usage of Private Endpoints and Private Links. Only two entities are allowed by IP whitelisting to connect to the platform: the data producers pushing data to the staging account, and the Azure DevOps agent running the IaC CI/CD.

We’ll go into more detail as we move forward in this article.

Private Endpoint & Private Link

Why are Private Endpoints helping us here?

A Private Endpoint is a network interface that uses a private IP address from your Virtual Network. When accessed, this interface can connect you directly to an Azure service/resource using Azure Private Link. By using a Private Endpoint, Azure ensures all traffic going to the endpoint stays in the Virtual Network, and all traffic over Private Link stays within the Microsoft backbone.

In other words:

The network public access can be removed
The resource can be accessed only within the private network
We have more control on the network layer and on how requests are performed

We can set up a private endpoint for the following resources:

Storage account
Key Vault

Note that for storage accounts to access sub-resources you’ll need to define specific endpoints, e.g. dfs and blob for each of them

Additionally, you’ll need to create private DNS zone groups so your services can resolve the endpoints, you need to create one for each type of endpoint. It’s worth noting that these private DNS zone groups could be part of a shared infrastructure that only handles networking (e.g. Hub & Spoke).

Using Private Endpoints with a data platform

Here is a simplified example of the infrastructure we deploy for our customers at ELCA.

Here we have a total of 4 Private Endpoints (PE) for:

DataLake storage account
Staging storage account
Terraform state storage account (blob)
Key Vault (not present on the diagram)

To allow external data to be loaded in the staging storage account we make an exception and whitelist specific IP addresses. The other storage account can completely disable their public access (see below the interface of Azure with the available options).

As mentioned before, we can find the definition of the DNS zone groups for each type of PE.

How does it work?

Data arrives from authorized sources to the staging storage account (using a whitelist or VPN access). Thanks to the Private Endpoint the data is retrieved by the data platform using Azure Private Link. It can then perform multiple transforming actions using the computing power provided by its clusters. The transformed data can be then stored/retrieved from the datalake storage account, again, privately thanks to the associated Private Endpoint. Note that you are free to implement additional security measures such as Azure Firewall.

Terraform state & CI/CD

In the previous part, we’ve seen that, while we implemented the Private Endpoints for various resources, we were able to remove their public accesses, regain control over the network layer and solely enable requests coming from the private VNet and restrict all others.

However, when doing so we restrict Terraform’s ability to manage these resources from anywhere. Meaning that, once they are deployed, if Terraform’s requests are not coming from the private VNet (so through Private Link), they will be blocked. We successfully addressed this issue with a mechanism that would ultimately enhance the Terraform state file security.

Who should be able to access the state?

When working with Terraform you ideally want to integrate the deployment process (terraform deploy) into a pipeline to follow a Continuous Deployment approach. This means that you will never need to access the Terraform state file from your machine and with your user - only the pipeline and its technical user need this access.

Azure DevOps

We are using Azure DevOps to launch our deployment pipelines. This tool’s main advantage is the ability to leverage Azure Virtual Machine Scale Set (VMSS) instances to run our jobs. This feature ensures that the Virtual Machines (VMs) used are within our dedicated Virtual Network. This enables Terraform pipelines to access the Private Endpoints so it can manage deployed resources whose access has been restricted.

In addition, prior to running the first Terraform pipelines, we also deploy a Private Endpoint for the storage account containing the Terraform state. This way the instances running the jobs can access this storage account through the Private Endpoint and allows us to completely remove its public access.

Put simply this mechanism lets us:

Use Terraform from the pipelines to manage resources whose public accesses have been disabled (through Private Endpoints which are in the same VNet as the VMSS)
Secure the storage account containing the Terraform state file on the network layer, by removing its public access thanks to a Private Endpoint

See also how to orchestrate Terraform deployments in Azure Devops in our previous article “Terraform Lasagna — How to layer your deployment”

Wrapping it up

Using Private Endpoints & Private Links enabled us to ensure a highly secured data platform deployment. They provide private communication between our Virtual Network and other services through the Microsoft backbone. This gets rid of the uncertainties regarding the network layer. In addition, while removing our resource's public accesses, we’ve been able to maintain the possibility to use Terraform to manage our infrastructure and to improve the state file security. To do so, we relied on a Virtual Machine Scale Set, sitting in the same VNet, to run our Azure DevOps jobs. This strategy not only guarantees that Terraform pipelines still have access to the deployed resources whose access has been restricted but also allows us to activate a Private Endpoint for the storage account containing the state file. Ultimately we can remove its public access, and thus improve its security.

Security is always a big deal for us at ELCA. We strive to deliver optimal value to our customers while ensuring the highest level of security. This architectural model has been instrumental in helping us accomplish this objective for our data projects.