Getting Started with Azure Data Labs Terraform Modules

Step-by-step guide to create your template from scratch

Aitor Murguzur
Microsoft Azure
4 min readJul 21, 2023

--

Kudos to Jorge Docampo and Walter Grasselli for their contributions.

Inspired by Atomic Design

Azure Data Labs (ADL) provides 75+ Terraform modules and built-in templates to easily deploy required data infrastructure on Azure. While these built-in templates are a convenient starting point, they may not always meet your specific requirements, prompting the need for a custom Terraform deployment. In such situations, when the built-in templates fall short, you have the flexibility to create a new Terraform deployment from scratch using existing ADL modules. However, some of the common questions are:

  • What are the minimum required files for a new template?
  • How can you instantiate or reference existing Azure Data Labs modules?
  • How can you easily test the deployment?

In this blog, you will find answer to those questions and learn how to create a new template from scratch. Before we start though, let us clarify some terminology:

  • Module: A module is a reusable and self-contained unit of Terraform configuration that represents a specific infrastructure component or set of resources. It encapsulates a collection of Terraform code and configurations, allowing users to define and manage infrastructure resources in a modular and scalable manner. Modules can be referenced, shared, and reused across different templates, providing a consistent and efficient way to provision infrastructure resources.
  • Template: A template (aka Terraform root configuration file) refers to the top-level configuration file that serves as the main entry point for your infrastructure deployment. It brings together different modules and orchestrates their integration to define the desired infrastructure state. Hence, the root module acts as a composition mechanism, allowing you to bring together and orchestrate different Terraform modules to define the desired infrastructure state.

> Jump directly to the GitHub repo if you want to stop reading.

Creating a New Template

To create a new template, follow the steps below. If you don’t have Terraform and Azure CLI installed, please install those first. Alternatively, if you use VSCode and you like to be up to date with the latest features, you could use a dev container with Terraform and Azure CLI, or try GitHub codespaces.

Select required modules

First of all, select which modules should be part of your template, e.g. Azure Data Factory, Azure Synapse workspace or ADLS Gen2. Some modules can be deployed using private endpoints or public endpoints, so based on your security needs you would need to play with the is_private_endpoint variable.

For this example, I selected four modules and all of them will be deployed using public endpoints by default.

+----------------------+----------+---------------------------------+
| Module | Default? | Comment |
+----------------------+----------+---------------------------------+
| Resource Group | Yes | One for deployed resources |
| Storage Account | Yes | ADLS Gen2 with public endpoints |
| Key Vault | Yes | With public endpoints |
| Data Factory | Yes | With public endpoints |
| Databricks workspace | Yes | With public endpoints |
+----------------------+----------+---------------------------------+

Create the template files

Once modules are selected, create required files for the new template. In this case, you will need three core files plus one file per module reference.

  • main.tf: this is the primary file where you define different providers and data variables.
  • variables.tf: defines the input variables that allow you parameterize your configurations. Those are later used within the locals.tf and/or module reference Terraform files.
  • locals.tf: this includes local values derived from other variables or resources within the Terraform configuration.
  • {module_name.tf}: it’s recommended to create a separated file for each module reference. You can also group all module references within a single file or follow another separation of concerns strategy though, e.g. all network configurations.

For this example, I created seven files including five files for module references. As you can see below, an ADL module can be referenced by pointing to the existing GitHub repository. Learn more about module sources here.

# Resource group

module "resource_group" {
source = "github.com/Azure/azure-data-labs-modules/terraform/resource-group"

basename = local.basename
location = local.location

module_enabled = true
tags = local.tags
}

Test the deployment

Once you create the required files (you can clone it from here too), follow these steps to test out the deployment:

  1. Run az login first in your terminal and select the subscription for deployment az account set --subscription <subscription_id>
  2. Go to the created (or cloned) directory
  3. Update the variables.tf and with your desired values. For instance, try changing postfix
  4. Run Terraform
$ terraform init
$ terraform plan
$ terraform apply

Once the deployment is finished, jump to portal.azure.com and you will see the resource group with the underlaying resources: Databricks workspace, Key Vault, Data Factory and ADLS Gen2. You can clean up the resources by running terraform destroy in your terminal.

For any suggestions or questions, feel free to reach out :)

References:

--

--

Aitor Murguzur
Microsoft Azure

All things data. Principal PM @Microsoft Fabric CAT Spark. PhD in Comp Sci. All views are my own. https://www.linkedin.com/in/murggu/