ADF Pipelines + CICD, the YAML approach

Published in

Wortell

9 min readMay 27, 2020

Recently i wrote a blog about managing KeyVault secrets in ADF pipelines and promised to write a blog about the actual CICD (Continuous Integration / Continuous Delivery) we used for ADF. That time has come now!

In our projects we make use of the CICD approach with DTAP for development and release, however we only used it so far for web applications and Azure Functions. This time we needed to see how we can use DTAP for Azure Data Factory and Azure SQL Database, preferrably ofcourse with the use of YAML since this also gives you the advantage of version control of the deployment process itself.

Microsoft made CICD with YAML General Available in Febrary 2020, so this means it you can now go ahead and use it also for the CD part (dont use classic anymore). Just like in the first CICD blog post i encountered several challenges which needed to be sorted out!

The CICD process itself is not new, and is pretty good documented on the internet. Look for example at the following tutorials and documentation:

The first challenge was when i first looked at the possible ways to do CI CD with Azure Data Factory, basically according to Microsoft there are 2 possibilities:

Manually upload a Resource Manager template using Data Factory UX integration with Azure Resource Manager.
Automated deployment using Data Factory’s integration with Azure Pipelines

As we want to have a full CI CD with no manual actions besides approval, the first option cannot be used. The second option is promising but the Microsoft example is still based on the classic version of the Release pipeline and not the YAML version :

https://docs.microsoft.com/en-gb/azure/data-factory/continuous-integration-deployment

I decided to split up the pipeline in two parts, a build pipeline which contains the Azure Data Factory artifact and a release pipeline. You could also the build pipeline part as an initial stage to the release pipeline, this is all up to how your process requirements are. In our case it was to store every published every ADF version in a separate artifact for tracability.

If you remember i also was talking about an Azure SQL database, which also needs to be update with the latest schemes to match the ADF. In our case we use Visual Studio to design the database layout, if you build a sql project there then by default a DACPAC file gets created.

In order to have a DACPAC file available in the CICD pipeline, another artifact needs to be created every time the project gets updated, in our case if a commit is done to the develop branch. This YAML file is very straight-forward and is based on the default ASP.NET build template, select Pipelines->Pipelines->New Pipeline, give the pipeline a name and add the following code in it.

You can use the Assisant available at the right side of the editor screen to quickly generate the code blocks, just type in the task name like for example ‘Publish Build Artifact’ and fill in the parameters.

The above codes produces an artifact, in this case with standard naming of ‘drop’, remember this because we need it in the next section.

The build pipeline part triggers whenever a publish action is performed in ADF (button ‘publish’), basically the publish causes the adf_publish branch to be updated with new ARM configuration templates which is out-of-the-box behaviour when ADF is linked to a Git repository. Your ‘adf_publish’ branch will look something like this:

You probaby noticed there is a YAML file, and that one is the Build pipeline for the ADF publish! It is placed in the adf_publish branch so it will trigger whenever a publish is performed. The same procedure as before to create a build pipeline, just make sure that this pipeline is created in the ‘adf_publish’ branch!. The YAML code is as follows:

The first publish task at line 21 takes care of creating an artifact of ADF ARM templates.

Now maybe you remember i also talked about that SQL database, often you want this to also be updated together with ADF at the same time? This is why we first download the latest build artifact from the sql database project at line 24 and then publish this again as a new artifact. This way we bundle those artifacts together and avoid the problem that we use a DACPAC file which is newer or older then the ADF published files. Ofcourse it is still possible that a developer would forget to commit the latest sql projects commits, but this is what the testing environment is meant for right?

Dont forget here to select the correct project id in line 29 and pipeline version in line 30. You can retrieve this by using the Task assistant on the right side and select Download Build Artifact as a task. After this you can select the correct project that contains the drop artifact which needs to be downloaded. Using the assistant will automatically generate the correct project id.

After this part you have the artifacts ready for the Release part.

Now for the release part, we decided to create a testing and some production environments and we also wanted to be able to approve any deployment that happens, no matter which environment it is. In order to do that first create the environments you want to have in your pipeline from the menu ‘Pipelines->Environment->new Environment’. Give the environment a name, after this select the 3 dots at the top-right corner.

Select approvals and checks and then add Approvers in the next menu, this could be single users or Teams. Close all windows after this. Now the next step is to create the actual Release pipeline. Select Pipelines->Pipelines->New Pipeline.

The yaml code for the inital part is as follows:

In this part we define 2 group variables, for each environment we have one. The variables are defined at Pipelines->Pipelines->Library, for each environment one and those contain variables specific for your product. In our case for example it holds the Datafactory name, Database name, Keyvault name and so on, these are all different per environment.

In the next part we define the first of our stages, which looks like this:

Over here the artifacts are retrieved from the build and are again published inside this pipeline again. The reason for this is that this way it is possible to rerun a stage with the correct artifact versions, thus creating the option of falling back to an earlier version. In the classic Release pipeline this functonality is already there (you can revert back to an earlier release), but for the yaml version you still need to handle this yourself (at least, i didnt see an alternative way).

In this part you see the first of the DTAP environments that is deployed, there are a couple of things here i want to mention:

The ‘dependson’ is set to stage GetLatestArtifact, so this stage will wait until that stage is finised.
The ‘environment’ is set to ‘Test’, which means the approval process will be used for this stage before it starts.
A template file is called with variables from one of the library groups. The name of the variables should match the ones as defined in the library group to be properly used.

The next stage could be something like this:

As you can see the basic part stays the same, dependsOn now checks if the Test staging is finished first, also this stage will make use of the approval process again and is linked to production variables in the library group.

Lets take a look at how the deployment template file looks, again this is a YAML file which is part of therepository (doesn’t need to be defined in a pipeline editor):

The first part of the template file is about defining the parameters which are needed in this file, depending in your situation you would need different parameters here.

A couple of things happen here :

First Azure KeyVault secrets are retrieved, which are required in this template
The ADF and DACPAC artifacts are downloaded to local workspaces to make them available in this stage
The Azure Datafactory triggers are stopped
The ADF Arm templates from the ADF artifact is deployed to Azure Datafactory. Notice that some parameters are being overriden here, for example the Azure Keyvault name and the Azure Datafactory name. Depending on your situation you would have different parameters which need to be updated in this step.

The next 3 big parts is about cleaning the ADF, parts like pipelines or linked services which are not present anymore.

It is pretty straightforward, any component that isn’t part of the published ADF json template will be removed, the above example is taken from Microsoft and converted to YAML:

https://docs.microsoft.com/en-gb/azure/data-factory/continuous-integration-deployment

This is the last part, where the following is performed:

The ADF triggers are started again.
The DACPAC file is deployed to the SQL database, make sure over here that you are pointing to the correct location of the DACPAC file which would be different in your project (check your Artifact first to find out). Also, whenever you make a change to the SQL database schemes where potentially data can be lost (for example a column is dropped), this step would end in a failure which is default by design. Usually in this case you would want to check first if it is okay to perform this deployment and for example manually remove or change the columns in question (or just deploy the DACPAC manually for this rare occasion).

This finalizes the deployment of ADF + SQL database, if needed you could ofcourse add you own additional deployment steps if needed.

Now you can create a new release by selecting the Release pipeline and select ‘Run Pipeline’. A new release will be created and Artifacts are downloaded which would look something like this:

When the GetLatestArtifacts stage is ready, you can perform an approval step:

Select the ‘Review’ button and perform a review and acknowledge this which will start the ‘Test’ deployment stage. After succesfull deployment, it would look like this:

The ‘Production’ deployment stage is now waiting for approval while the ‘Test’ stage is deployed. In order to perform a rollback to an earlier release, select that specific release, go to the stage you want to redeploy and select the 2 triangles symbol in the stage.

Now you have the option to ‘Rerun stage’, basically means you would deploy this specific version again. As long as you dont rerun the ‘GetLatestArtifacts’ stage you would be using the Artifacts version already retrieved when the release was created the first time.

That is it! Now with this approach you can get publish Azure Data Factory + SQL Database from a CICD pipeline, in YAML! Once you have the template yaml files ready, it is quite easy to re-use them for new projects again and saves a lot of time during development and testing when used at the start of a project.

Keep developing!

Jos Eilers, Technical Advisor, Data & AI

ADF Pipelines + CICD, the YAML approach

Written by Jos Eilers