Infrastructure pipelines with Azure DevOps

Konstantin Shilovskiy
DataReply
Published in
6 min readJun 27, 2023

--

Photo by T K on Unsplash

Azure DevOps provides services that cover the whole development lifecycle. It has all the functionality you would expect from a DevOps Platform like GitLab or GitHub. With Azure DevOps you can track tasks, host git repos, store artifacts and build CI/CD pipelines.

If you look at the documentation for Azure DevOps, you will find descriptions of CI/CD pipelines that run tests and deploy applications. However, if you want to automate the resource provisioning, like VPC or Kubernetes cluster creation, there is significantly less information about that. With this blog post I would like to bridge the gap and demonstrate how Azure resources can be deployed with Azure DevOps pipelines.

I will start with an introduction of the project structure for the infrastructure-as-code scripts using Terraform and Terragrunt. Then, I will define the features for the pipeline to apply those scripts. Finally, I will show how the pipeline can be implemented via Azure Pipelines templates.

You can find all the code used in this article here.

Prerequisites

To run the pipeline from this post, you need to have the following settings configured in your DevOps project:

  1. Install TerraformInstaller task from DevOps Marketplace
  2. Create a service connection with permissions to azure resources go to the DevOps panel, add the service connection by navigating to Project settings -> Service connections -> Azure Resource manager

In addition to the DevOps services, I will be using Terraform and Terragrunt as the main tools for planning and applying infrastructure changes.

While the examples I will be demonstrating can be done with standard Terraform, I have chosen Terragrunt to keep the configuration DRY.

Terragrunt offers convenient features like running commands on all modules, hooks and auto-retry. It is a lightweight and practical tool. I highly recommend checking it out!

Before I jump into the pipeline code, let us explore the project structure.

Terragrunt project structure

I typically split infrastructure projects into two main parts:

  1. “modules” — contain Terraform code.
  2. “envs” — contain “live configurations” (values specific to an environment).

You can think of modules as scripts with function definitions and of envs as function calls.

Here is how the project structure looks like:

Terragrunt project structure

The core module describes resources common to all other modules (e.g., resource group); network defines vnet and subnets; vm defines virtual machines.

In the example above both envs and modules folders have similar structure. However, envs may contain multiple folders for the same module (e.g., dev, staging and prod).

With this structure, live configs have their own terraform state files created on the backend.

This makes modules easier to review, test and deploy. If you wonder why, check out an excellent post by Gruntwork describing the disadvantages of large Terraform modules.

Continuous Integration

Testing infrastructure changes is hard. Usually, the tests are more complicated and time consuming than application integration tests. Quite often the changes are tested in the dev environment manually and are then propagated to staging and production. This means we need an option to review the changes before applying them to the environment.

Atlantis is a popular team collaboration tool for infrastructure development.However, it does not have a proper solution for validating changes in order.

In our example the order is core -> network -> vm, but Atlantis will not show the real changes to be used in vm until network is applied. In other words, you are not 100% sure of what values will be used when reviewing the plan in Atlantis. The solution is to have a pull request for each module, but it adds more steps to the development process.

Infrastructure Pipeline

To overcome the limitations mentioned above, the infrastructure pipeline should have the following features:

  1. All changes should be reviewed before applying them (at least in production)
  2. Changes should be applied without any manual steps except approval
  3. It should be possible to change multiple modules in one PR, but changes should be applied according to the module dependencies (core -> network -> vm)

Luckily, Azure DevOps allows us to build such pipeline.

Implementation

First, let us create a template that installs Terraform and Terragrunt. This is required as every pipeline job is running in an isolated context.

Then we add another template for planning, validating and applying changes of a single module. Since Azure pipeline code may be overwhelming, I’ll be explaining each part separately.

First, we define parameters to make the template generic.

Then comes the block for running the plan command:

The first job, plan_terraform_${{parameters.module}} waits for all the parent modules to finish first (lines 5–7). Then it runs terragrunt plan and determines if there are any changes to be applied (lines 17–26). It stores the result in HAS_PLAN variable (line 25). Finally, it publishes the terraform plan as the pipeline artifact (lines 30–32). This artefact will later be used by the apply job.

After the plan job os complete, the job for manual validation starts.

The second job, manual_plan_validation_${{parameters.module}} , sends an email notification and waits for the user input if there are changes to be applied (line 10–19). After receiving the notification, the user can go to the DevOps portal and approve or reject the job. If there is no user response with the timeout (line 14), the job will be marked as ‘failed. This is being done to avoid extra costs.

Finally, we add a job to apply the terraform plan generated earlier:

The third job, apply_terraform_${{parameters.module}} , downloads the terraform plan (line 13–15) from the plan job and applies it (line 16–26). The apply job only runs if the plan job finishes successfully (line 5) and there is a plan to apply (line 26).

Unfortunately, ManualValidation task can only run in agentless jobs. This means that we cannot run all the steps together and need to have 3 jobs instead. Each job is running on a ‘fresh’ virtual machine. Therefore, plan and appply jobs need to install terraform as the first step. This can be improved by running the jobs using self-hosted agents or pipeline cache.

By specifying module and dependsOnTfModules properties we can build the dependency graph in the pipeline.

If jobs have module dependencies, they will wait for the parent modules /jobs to finish first. For example, “vm generate” job waits for the “network apply” to finish successfully. This is required for terragrunt to use the outputs of the parent modules as values of the the child modules.

Thanks to the “warning” statement defined in the generate plan job, we can see which modules require manual validation. By clicking on the warning , you can get directly to the terraform plan output.

Here is how the Terraform changes look like:

As you can see, the pipeline satisfies all the requirements we defined earlier. It applies changes to all modules in a single run while still allowing for manual review and approval of each change.

You can further speed up the pipeline by using self-hosted agents with the preinstalled terraform and terragrunt. Also you can extend the templates to add the hooks or to skip manual validation for the dev environment.

There are definitely more ways to improve the current pipeline. However, I hope this blog post will be useful as the first step in defining an infrastructure pipeline in your project.

IMPORTANT: Do not forget to clean up the resources created via the pipeline. Otherwise, you will encounter charges from Azure at the end of the month.

--

--