Pulumi end-to-end on Azure, part one
At Glasswall SRE, we’ve recently been investigating using Pulumi as our Infrastructure as Code (IaC) solution, to speed up infrastructure deployments, reduce errors when creating new environments, and create a more stable platform.
This blog post is the first part in a guide to how we’re running Pulumi with a self-managed backend on Azure, complete with using Azure Key Vault as Pulumi’s encryption provider. It includes a short tutorial of bootstrapping a self-managed backend and automating Pulumi infrastructure deployments using Azure Pipelines.
Future parts will go into more detail about how to use Pulumi on a self-managed backend, and specifics about how we have implemented our SaaS infrastructure on Azure with Pulumi.
The story so far
For our FileTrust for Email SaaS product, most of our infrastructure is managed through IaC. Our infrastructure is hosted on Azure, and some of the tech we use on/with it is:
- Kubernetes
- Helm
- .NET Core
- Docker
- Python
- SQL Server
- Redis
- Prometheus
The Glasswall Site Reliability Engineering (SRE) Team is responsible for managing, maintaining, and supporting our infrastructure and the services that run on it in production. Our team is about five months old — comprised mainly of new hires — having inherited the existing CI/CD workflows from contractors before our time.
We use Azure DevOps as our CI/CD solution, utilising both build and release pipelines to package and deploy our microservices and infrastructure.
Our inherited IaC solution is written in Python, using the subprocess module to call Azure CLI commands. This has worked okay up until recently, where we’ve experienced a few growing pains with it as we’ve scaled:
- Using subprocess to call out to the Azure CLI can be brittle; if the CLI changes too much over different versions then your IaC is broken. This is amplified by the fact that some infrastructure deployments are run very infrequently. Everything is fine, until you need to redeploy a critical piece of infrastructure when it breaks, only to find your IaC is also broken due to API changes since it was last run.
- Logging errors during a CD run is troublesome when using subprocess. The error causing the run to fail will be a wordy Python exception, indicating that the subprocess exited unsuccessfully, with the actual CLI error hidden away in the logs above.
- The core infrastructure is fairly rigid, with the inability to pass outputs from one section of infrastructure as inputs into another. An example of this is our virtual networking. If we wanted to change a subnet prefix, we would have to modify many instances of it being hard coded (no doubt missing some and breaking deployments), as opposed to changing it in one place and having that propagate outwards. This won’t scale as we onboard new services and products.
- It’s a bespoke system, and difficult for new joiners to learn and adjust to. Documentation is sparse, and normally only says what something does as opposed to why it is doing it. There is no standardisation in terms of what is documented, where, and how.
- Best practices for writing software are generally not followed. Most of the code utilises Python’s
globals()
in order to read/modify global state. This leads to an almost complete lack of pure functions, numerous errors due to side effects, and code that is incredibly difficult to debug. - Due to using the raw Azure CLI for a lot of functionality, there’s a tonne of boilerplate in place for ensuring idempotency (being able to run something multiple times without changing the end result beyond the initial run), and in some places operations are just not idempotent.
- It only creates infrastructure. It cannot destroy it.
So, for these reasons, we were interested in seeing if we could find a more scalable solution for infrastructure deployments. Enter: Pulumi.
Halloumi?
Pulumi. In its own words, it is “Modern Infrastructure as Code”. Basically, it lets you codify your infrastructure in a commonly-used programming language (Javascript, Typescript, Python, Go, or .NET), as opposed to learning a domain-specific language for it (like Terraform’s HCL).
Pulumi is hooked up to a backend (they offer a SaaS cloud solution, but we have opted for a self-managed backend using Azure Blob Storage) which stores the state of resources as you deploy/destroy things. This allows it be run idempotently, and ensures that when you make a change to your IaC, only the minimal set of changes to your cloud infrastructure are made — speeding up build time! It also automatically detects the optimal order of operations, and even runs some steps in parallel for maximum speed.
Below are some screenshots of code. The first is our current IaC solution for creating an Azure Container Registry (ACR) in Python using subprocess, and the second is the Pulumi version, also written in Python.
As can be seen, the Pulumi version is a lot simpler; it’s self-documenting, the infrastructure configuration exists directly within the declaration, and we can pass outputs from one stage as inputs to the next using their object model.
How do you then deploy the infrastructure? Using the old version, you’d have to ensure you ran the Python script on one of our build agents as part of an Azure Pipeline, making sure you were authenticating using the correct service principal in Azure AD.
With Pulumi, you just sign in with the Azure CLI, log in to the Pulumi backend, and run pulumi up
. That’s it! Pulumi stores the state of the deployed resources in your storage backend, so you can run it on any machine and the results are entirely deterministic.
It’s the exact same process for CD pipelines too, except instead of logging in with the Azure CLI, you provide service principal credentials in environment variables.
So, without any further hesitation — let’s get stuck in!
Bootstrapping a Pulumi backend
You’ll need to set up two things on Azure in order for Pulumi to work:
- A storage account and blob container to store state files.
- A key vault and encryption key to encrypt/decrypt secrets.
This guide assumes you already have:
- An Azure AD account for yourself with permissions to create/destroy Azure resources.
- Azure DevOps set up for you to create Git repositories and pipelines.
- An Azure Pipelines service connection to your Azure subscription (via Azure Resource Manager), to allow build agents to create/destroy Azure resources.
Unfortunately, we can’t use Pulumi to create the Pulumi backend. It’s not a chicken/egg situation; the backend must be created naively first before we can harness the full power of Pulumi.
We’ll make a simple shell script to bootstrap Pulumi, and then put this in an Azure Pipeline with a CD trigger so we can deploy changes.
Create a bash script with the following contents:
#!/bin/bash
# make sure we exit on any errors
set -o errexit
set -o pipefilereadonly PULUMI_LOC=westeurope
This is literally just a bash script that’ll exit correctly if any errors happen. I also define a constant for the Azure location our Pulumi backend will be hosted in. If you want to change the location of the backend then change the constant.
As you follow this bootstrapping guide, add any code snippets given to the bash script. I’ll explain them along the way.
Resource group
In order to group our Pulumi backend cloud resources together, we need to create a resource group. Add this to your bash script:
az group create \
--name rg-pulumi \
--location "${PULUMI_LOC}"
It just creates an Azure resource group called rg-pulumi
in our Azure subscription.
Storage
Now let’s set up the blob storage Pulumi will need to store its state files in. Add this to your bash script:
if ! az storage account show --name stpulumi \
--resource-group rg-pulumi; then
az storage account create \
--name stpulumi \
--resource-group rg-pulumi \
--location "${PULUMI_LOC}" \
--sku "Standard_LRS" \
--encryption-services "blob" \
--kind "StorageV2"
fi
This creates a storage account called stpulumi
if it didn’t already exist. Enclosing it in the if
statement like that ensures that it’s idempotent.
Now add this:
export AZURE_STORAGE_ACCOUNT="stpulumi"
export AZURE_STORAGE_KEY=$(az storage account keys list \
--account-name stpulumi \
--resource-group rg-pulumi \
--query [0].value \
--o tsv)az storage container create \
--name pulumi \
--public-access off
We’re creating environment variables to store storage account credentials for the Azure CLI to be able to access it, then we’re creating a blob storage container for Pulumi to store its state files.
Key vault
This section is optional, as Pulumi has other methods of securely storing secret information, but this is what we used. If you’re gonna be using a different encryption provider, then feel free to skip this section and the one after it.
Sometimes you want to store secret information in your Pulumi projects — this is achieved through a Pulumi ‘encryption provider’. We’re using Azure Key Vault as ours. The basic principal is that you create an Azure Key Vault with an encryption key inside it. Pulumi will use this key for encrypting and decrypting secret values when they are stored in state files in the storage backend. This is an extra layer of security, and ensures that secrets don’t fall into the wrong hands, even if your state files happen to be compromised.
Add the following to your bash script:
az keyvault create \
--name kvpulumi \
--resource-group rg-pulumi \
--location "${PULUMI_LOC}" \
--enabled-for-template-deployment true \
--sku "standard"if ! az keyvault key show --vault-name kvpulumi \
--name pulumi; then
az keyvault key create \
--name pulumi \
--vault-name kvpulumi \
--kty "RSA" \
--protection "software" \
--size 2048
fi
This creates an Azure Key Vault in rg-pulumi
with name kvpulumi
, then creates an encryption key inside it called pulumi
that Pulumi will use to encrypt/decrypt secret values. Again, it has an if
statement to ensure idempotency, otherwise the key would be recreated each time (which would be quite bad if we wanted to decrypt existing values in the state file — the key would no longer work!).
Authorisation
Finally, we need to make sure our user account and Azure Pipelines service connection principal have the correct access policies to encrypt/decrypt using the key we just created.
For this step you’ll need to know the object IDs of both your user account, and the Azure Pipelines service connection principal.
You can get the object ID of your user account by running this command:
$ az ad user show --id {your-email-address} --query objectId -o tsv
I haven’t been able to find an easy way of getting the object ID of the service connection principal, but you should be able to find it by searching your Azure DevOps project name and Azure subscription GUID in the ‘enterprise applications’ section of Azure AD in the Portal.
Now, add the following to your bash script:
az keyvault set-policy \
--name kvpulumi \
--object-id {your-user-account-object-id} \
--key-permissions get list encrypt decryptaz keyvault set-policy \
--name kvpulumi \
--object-id {service-connection-object-id} \
--key-permissions get list encrypt decrypt
Here we’re simply adding the key permissions we need to the Azure AD objects that need them. As Pulumi ‘piggybacks’ your Azure credentials when you’re running it on your machine, you need to make sure your account is authorised to perform the required operations using keys in the vault. It’s the same story with the service principal.
Deployment
That’s the bash script done! If you run it (after running az login
and ensuring you’re on the correct subscription, of course), it should create the backend for Pulumi!
Please note that this script is quite naive; if you change any of the values within it, it won’t know to tear down the existing infrastructure, as there is no state file. It also doesn’t let you destroy the infrastructure you have created. To do that you’ll have to manually delete the rg-pulumi
resource group.
Deploying the Pulumi backend on Azure Pipelines
Now, let’s set up a CD pipeline to deploy our Pulumi backend.
Put your bash script in an Azure DevOps Git repository, and add a file called ‘azure-pipelines.yml’ somewhere in that repo. Inside this file, put:
name: $(BuildDefinitionName)-$(Date:yyyyMMdd)$(Rev:.r)
pool:
vmImage: ubuntu-latest
trigger:
- master
steps:
- task: AzureCLI@2
inputs:
azureSubscription: '{your-service-connection}'
scriptType: 'bash'
scriptLocation: 'scriptPath'
scriptPath: 'path/to/bootstrap.bash'
displayName: "Deploy Pulumi backend"
Make sure you put the name of your Azure Pipelines service connection in the azureSubscription
field, and the path to your bootstrap script in the scriptPath
field.
Create a new pipeline in Azure DevOps from an existing YAML file in your repo, and point it at your ‘azure-pipelines.yml’ file (Pipelines / New Pipeline / Azure Repos Git / Your repo / Existing Azure Pipelines YAML file).
Now, every time you make a push to the master branch of your git repo, the Pulumi backend will be deployed. You shouldn’t have to do this very often, if at all, as the infrastructure should stay fairly static.
Wrap up
That’s it for part one! We’ve managed to codify the infrastructure required for Pulumi, as well as automate its deployment. Believe it or not, this is actually the hardest part — the rest is a breeze!
Look out for part two, where I go through how to start creating cloud infrastructure with Pulumi on our self-managed backend.