Introducing Azure Data Labs

75+ reusable Terraform modules and built-in templates to provision your Data Labs on Azure

Aitor Murguzur
Microsoft Azure
5 min readMay 22, 2023

--

Azure Data Labs can help you instantiate recommended and favorite architectures for data workloads with just a few keystrokes.

Infrastructure modernization these days often involves the use of one or more public clouds. One crucial element that enables businesses to effectively manage their cloud infrastructure in a repeatable, understandable, and secure manner is Infrastructure as Code (IaC).

When it comes to managing Azure resources, businesses have a variety of IaC tooling options, including Terraform, Bicep, ARM templates, Pulumi, and more. In 2022, Microsoft introduced three significant enhancements to simplify the use of IaC templates and streamline Terraform workflows:

Since the launch, ADE + Terraform AzAPI + Terrafy have gained rapid adoption among users, with over 1.7M downloads since the release of AzAPI, and have proven invaluable for managing data workloads and deployments in Azure.

Why have Azure Data Labs?

Despite the usefulness of ADE, Terraform, and other IaC tooling for setting up infrastructure, both non-technical and technical data personas (including data engineers, data scientists, data analysts, and others) still encounter challenges when deploying the necessary data infrastructure (labs) to start working on their use cases.

These challenges primarily stem from a lack of reusable templates:

  • Time-consuming and repetitive: deploying the same or similar infrastructure e.g. for testing or development can be a time-consuming process, particularly if it needs to be done repeatedly.
  • Error-prone: repeated deployments can be more prone to errors (incorrect configurations, incomplete deployments, or security vulnerabilities), as there is a greater risk of human error.
  • Difficulty in scaling: it can be challenging to scale deployments as well (deploy more), particularly if there are manual processes involved in the deployment (complex and easy ones alike).
  • Lack of standardization: when deploying infrastructure repeatedly, it can be tricky to maintain consistency and standardization across deployments and dependencies.

To address these challenges, we’ve created Azure Data Labs empowering data personas to streamline their deployment processes.

> Jump directly to the GitHub repo if you want to stop reading.

What is Azure Data Labs?

Azure Data Labs (ADL) is a solution accelerator to enable data personas easily deploy required data infrastructure on Azure. This helps data teams quickly spin up Azure resources and start being productive as follows:

  • Data Labs in minutes: getting required Azure data resources up and running fast. All modules use Terraform and are built into GitHub. You can use Terraform and GitHub Actions to deploy for now.
  • Template Centric and Extensible: ADL includes built-in and extensible templates (aka blueprints) and +75 modules. You can configure and deploy a template, or create your own from scratch.
  • Empower Data Personas and IT: self-service capabilities using GitHub Actions and Terraform, enabling easy deployment for data teams, DevOps, IT, Cloud Center of Excellence (CCoE) and others.
  • Security and Governance: supporting Private Endpoints by default. Modules / templates can be adjusted according to internal security frameworks, cloud policies, DevOps guidelines, and more.

“It all starts at the infrastructure layer, and how we are helping you build agility and optimize your business with Azure. Don´t have to wrestle with the complexities of building and operating cloud-scale data infrastructure yourself. Spend more time creating value and less time integrating and managing your data state.” — Satya Nadella at Microsoft Ignite 2022

Getting Started with Azure Data Labs Templates

You can deploy a template using Terraform and GitHub Actions for now. Azure DevOps support is coming soon.

Deploy an Azure Data Labs Template Using Terraform

To use the Terraform option, please make sure you have the Azure CLI and Terraform installed. Then, follow these steps:

  • Clone the repository: clone this repository and select one of the built-in templates: Synapse, Databricks or AzureML.
  • Update the config: update config-lab.yml with your desired values. You will need to change prefix and postfix to ensure unique names.
  • Deploy the lab: go to <lab>/infra/terraform folder of the selected template and run below commands.
$ terraform init
$ terraform plan
$ terraform apply

Deploy an Azure Data Labs Template Using GitHub Actions

If you prefer to deploy an ADL template using GitHub Actions you need to follow three steps (see the detailed step-by-step guide here):

  • Pre-requisites: follow the step-by-step instructions for prerequisites, includes Service Principal and secrets creation. See here.
  • Update lab parameters: as in Terraform, customize deployment options by modifying different parameters in config-lab.yml.
  • Deploy from GitHub Actions: navigate to the Actions tab of the main page of the repository, select the lab and click Run workflow.

Contributing

Azure Data Labs modules and templates will evolve covering new Azure data resources and diverse architectures (e.g. linked to Azure Architecture Center and others). It’s important to note that official support from Microsoft is not available for Azure Data Labs.

This initiative is a community-driven initiative, and we encourage you to contribute your own modules and templates. If you’d like to add more resources or you run into any issues using Azure Data Labs templates and/or modules, feel free to create a pull request or open an issue.

Visit our GitHub repositories, explore our templates and modules, and contribute today at:

[Co-authored with Jorge Docampo. Kudos to Walter Grasselli, Ignacio Alonso, Willie Ahlers and all contributors]

--

--

Aitor Murguzur
Microsoft Azure

All things data. Principal PM @Microsoft Fabric CAT Spark. PhD in Comp Sci. All views are my own. https://www.linkedin.com/in/murggu/