CodeX
Published in

CodeX

Continuous Deployment for Azure Data Factory

Photo by Joshua Sortino on Unsplash

tl;dr

CI/ CD in general

Continuous Integration

Continuous Deployment

Then generell workflow for CI/ CD and prerequesits

  1. An Azure Account — obvious, if you do not have a Microsoft account with an Azure account and a subscription to create Azure resources, go to www.azure.com and sign up. New accounts get 100$ free for 30 days (no ad).
  2. An Azure DevOps organization that is linked to the aforementioned Azure subscription with a git repository for the development environment of the data factory.
  3. Not necessary, but I would recommend if you set up a local working environment with Azure CLI (az), git, and Visual Studio Code — the latter is more or less necessary, but you can use any editor

How code should flow

  1. Create a main collaboration branch in the development Data Factory.
  2. As a developer create a new feature branch to make the changes you want.
  3. Create a pull request for your feature branch to the main branch.
  4. As the PR gets accepted and merged into the main branch a new build of the ARM Template for the Data Factory should be triggered and the current state of the main branch should be deployed to the various Data Factory environments you use.
  5. The current state of the Data Factory as defined in the main branch should be published to the development Data Factory.
  6. After approval, the new version should also be deployed to the next stage, whether it’s UAT or Production. In the following example, it’ll be published to Production right away.

Setting up the infrastructure

Code repository first

Setup the Dev Data Factory using bicep

Required files to build the ARM templates

NPM package configuration file

The CI Part of the Azure Data Factory

  1. Before going through the steps of building the ARM Template, it defines to start the build pipeline for every commit in the main branch, if you like you can change to any other release branch. Also, it uses the latest ubuntu image for the build process and two variables for the working directory, and the subscription to use
  2. The pipelines start with the steps in the build stage
  3. In steps 1 and 2 node.js and npm are installed
  4. Step 3 validates the artifacts using the ADFUtilities NPM package
  5. Step 4 creates the arm templates to be deployed using the ADFUtilities npm package
  6. Steps 5 and 6 use bicep to create the data factory based on the ARM Template and the bicep file
  7. After creating the artifacts in the artifacts folder in the development stage these are deployed to the development data factory.
  8. Last but not least the latest version of the main branch is also deployed to the production ADF. Note that the factory name to deploy to is defined via a variable in the stage, which refers to the parameter given in the arm_templates_parameters.json.

The CD Part for the Azure Data Factory

Download the artifact, deploy to development and last but not least after approval to production
  1. Download the artifacts
  2. Publish the current main branch to the development data factory
  3. Publish the current main branch to the production data factory
Tasks in the Pipeline

Download the artifacts

Download the artifacts

Deploy to Development and Production

Predeployment configuration setup
Configuration for the actual deployment
Configuration for the post deployment script

Wrap up

Sources

The full pipeline

--

--

Everything connected with Tech & Code. Follow to join our 1M+ monthly readers

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Otrek Wilke

Data Engineering made easy. Writing about things learned in data engineering, data analytics, and agile product development.