Managing ADF Pipeline KeyVault Secrets, the CICD Approach

Jos Eilers
Wortell
Published in
5 min readApr 20, 2020

Lately i have been involved in integrating Azure Data Factory pipelines in our CICD environment, and come across several interesting challenges to solve. I tried to find online if there were people who already had used Data Factory in combination with Azure DevOps, surprisingly there was not much detailed info available. I encountered 2 challenges, and decided to share the solutions to these in some blog posts. Today i will show you guys how to handle KeyVaults + ADF in a CICD pipeline!

Azure Key Vault + Azure Data Factory = Safe

A very nice feature of Azure Data Factory is the use of Azure Key Vault, passwords/client secrets and connection strings which can be used to for example quickly create connections to SQL databases, Storage Accounts and various others.

For example, to set up a Linked Service that uses Key Vault secrets to maintain a connection to Blob Storage, you can reach the Key Vault with the use of Managed Service Identity. The Linked Service connection creation would look like this:

In this case the secret ‘SecretConnection’ is retrieved from the Key Vault, the Pipeline can use the Linked Service for data retrieval/storage purposes. For more information and a tutorial go to the Microsoft documentation here.

In a CICD approach with Azure DevOps one would change the AKV Linked Service depending on which stage you are in, and thus the correct Blob Storage will be used, no matter if you are in a Development, Testing or Production environment. We will discuss the challenge how to set up CICD with Data Factory including dynamic connections in another blog post :-D

Using Key Vaults secrets in a Pipeline

Another common approach with Data Factory is to use Key Vault secrets inside a Pipeline, for example to call a REST Api which requires a token or some custom scenario where you want to use secrets in a Pipeline not covered by Linked Services. Again there are already solutions out there, like here and here. Basically, it means to use the Web Activity+MSI to retrieve the Key Vault secret

In a standard situation with a single Data Factory the above approaches works well. I was facing however a situation where i needed to retrieve Key Vault secrets in a Pipeline, this Pipeline however was also used on multiple environments using DevOps. The problem with the standard approaches was that the Web Activity was hard linked to the Key Vault secret location, and this now needed to be dynamic depending on the DTAP environment.

Dynamic Approach

In order to make the link to the Key Vault secret dynamic i made use of the facts that in a CICD environment resource naming is usually standardized. An example of a development keyvault name could be: WE-Dev-Keyvault, while a a production name would be: WE-Prod-Keyvault.

I will give a full example here on how to retrieve a secret dynamically depending on the environment, do the following steps:

  1. Create in Azure a Data Factory resource, name this for example ‘ExampleDevFactory01’
  2. Create in Azure a Key Vault resource, name this for example ExampleDevVault01
  3. In the Key Vault open the secrets tab, and add a secret named ‘SecretConnection’

4. In the Access policies add the Data Factory to the policies with ‘Add policies’, for the tutorial case you can select Key, Secret & Management access.

5. Now go to the Data Factory resource and select ‘Author & Monitor’, select the ‘Author’ option on the left side and create a new empty Pipeline.

6. Now select the newly created Pipeline, select the tab ‘Variables’. Create a parameter, for this tutorial just name it ‘Secret’ :-D

7. Create a new Web Activity, name this ‘Get Secret’. Make sure to also enable ‘Secure Input’ and ‘Secure Output’, this will make sure secrets will not be visible during debugging sessions or in logging.

8. In the ‘settings’ tab this is where the magic will happen. Remember that in this tutorial the name of the Data Factory is ‘ExampleDevFactory01’ and that the name of the Key Vault is ‘ExampleDevVault01’?’ We are going to make use of the fact that we can retrieve the name of the Data Factory, and will reuse the first part of the string (ExampleDev) and add the string (Vault01) to it, thus figuring out the Key Vault name based on the factory name. In the case of other environments, the Key Vault name would be ExampleProdVault01 or ExampleTestVault01. This string replacement can be achieved by using the ‘Add dynamic option’, select this in the ‘URL’ tab. In this case the dynamic option should be like this:

@concat(‘https://', substring(pipeline().DataFactory,0,indexof(pipeline().DataFactory, ‘Factory01’)), ‘Vault01.vault.azure.net/secrets/SecretConnection?api-version=7.0’)

Notice the substring replacement of Factory01 by Vault01, also notice the link to the secret ‘SecretConnection’. Don’t forget to add ‘?api-version=7.0’ to the end!

8. In the ‘Advanced’ section make sure to add https://vault.azure.net and set authentication to MSI.

9. Now add a ‘Set Variable’ task to the Pipeline. In the tab Variables set Name to ‘Secret’ and set Value to the dynamic ‘@activity(‘Get Secret’).output.value

That is it! Now with this approach you can get retrieve Key Value secrets and set them as Pipeline variables to be used however you want to. This helped me a lot to standardize Data Factory Pipelines and also get the CICD up and running with Data Factory. Keep in mind to always use ‘Secure input’ and ‘Secure Output’ whenever you are using secrets in a Pipeline, be safe. In a next blog i will show you how to set up a full CICD with Data Factory, in YAML!

Keep developing!

Jos Eilers, Technical Advisor, Data & AI

--

--

Jos Eilers
Wortell
Writer for

Technical Advisor at Wortell. My interests are Azure, A.I. / M.L / DevOps / C# / Robotics / IOT