Azure Data Factory — Retrieve Key Vault secrets in your pipelines at runtime

Francesco Milano
4 min readMar 25, 2020

Azure Data Factory and Azure Key Vault: better together

One of the most useful out-of-the-box integration in Azure Data Factory is undoubtedly the one with Azure Key Vault.
If you develop with security in mind (who doesn’t? ^_^), orchestrating external services could be a little worrying: keys, password, certificates and connection strings must be kept secure while, in the meantime, you have to use them to consume and connect to such services. Moreover, since you’re versioning your code (who doesn’t? ^_^), you have to be sure you’re not versioning your secrets too.

Luckily, a very friend of ours eventually come to the rescue — decoupling.

Azure Key Vault main role is to keep sensitive information secure, and your customers could also choose to encrypt them using their own key, so everybody’s happy.

Azure Data Factory, on its side, can leverage a built-in linked service to connect to Key Vault. Authentication between the two relies on MSI — Managed Service Identity.
Here’s a quick summary of the steps involved when you have to securely connect to a service or data store:

  1. Retrieve data factory Managed Identity
  2. Grant the Managed Identity access to your Azure Key Vault
  3. Create a Linked Service pointing to your Azure Key Vault
  4. Create data store Linked Service, inside which reference the corresponding secret stored in key vault

For a more in-depth overview and tutorial read here, while additional information about MSI can be found here.

Using Web Activity to retrieve a Key Vault secret at runtime

What described so far is the most common type of integration between ADF and Key Vault, and it covers pretty much all the cases you’ll face.

However, sometimes you need to retrieve a particular secret value not when defining the linked service, but inside an activity. This is typical of Web Service calls where you need to pass an api key or a token (i.e. bearer token) assigned to you. These keys are usually part of the request headears and you have to configure them in the activity tab.

Problem is, activities and datasets unfortunately have no direct integration with Key Vault. You may think about using expressions in some fancy way to circumvent the problem, but you can’t access your linked service properties through them.

The extreme versatility of Data Factory (and Azure environment in general) makes possibile to use Web Activity task to solve the situation.
Let’s discover how.

A maybe less known property of Key Vault secrets is their identifier, which is pratically an URI:

With this URI, along with the proper authentication — MSI cited above, we can now issue a GET request in our pipeline to get in response the secret own value, so we can store such value in a variabile or use it directly in the subsequent activity.

Important to note, this approach does not require you to have a Key Vault linked service already in place. In fact, you’re making a REST call to a Key Vault service your Data Factory is authorized to access. Other than the Web activity itself, you just need to setup security accordingly.

All the process is described in-depth on the docs, but let’s resume the key steps

  1. Authorize your Data Factory MSI to access target Key Vault
  2. Get the identifier of the secret you need to retieve
  3. Add and configure a Web Activity specifying
    a) the Secure Output field — should always be true
    b) the URL field — your secret URI value (plus the ?api-version=7.0 query parameter at the end)
    c) the Method field — That’s GET
    d) the Authentication field — MSI
    e) the Resource field — https://vault.azure.net

Please note, the Secure Output field is extremely important, since you want to avoid logging in plain text the Web activity response — It will contain your secret value!

https://docs.microsoft.com/en-us/azure/data-factory/how-to-use-azure-key-vault-secrets-pipeline-activities

4. Reference the Web Activity output where you need it using an expression. You could also store it in a pipeline variable for later use.

https://docs.microsoft.com/en-us/azure/data-factory/how-to-use-azure-key-vault-secrets-pipeline-activities

That’s it, elegant and simple.

Wrapping up

Cloud environments usually come with the benefit of an extensive set of exposed API — and Azure makes no exception. This gives you flexibility and power when interacting with their services, enabling developers to sort out what may appear to be a complex problem with straightforward solutions.

Azure Key Vault makes easy to protect your sensitive information, and Azure Data Factory wide offer of out-of-the-box connectors and activities cuts the time needed to make things work together in the right way.
Among that, security is often built-in and transparent to the developer, removing the hassle to understand how services talk to each other behind the scene. Obviously, such understanding should be simply postponed a little bit, and not taken for granted (who doesn’t? ^_^).

Thanks for reading, and happy secure orchestration!

--

--

Francesco Milano

BI & Data Architect, spare time drummer, rubber chicken with a pulley in the middle