Computing password-digest Auth in Azure data factory using KeyVault and Azure function app

Dupe Yoann
xgeeks
Published in
6 min readJul 22, 2021

This article was co-written by Yoann Dupe, Maher Deeb, Riccardo Bove and João Sousa, from the xgeeks team.

Photo by Jason Pofahl on Unsplash

Using Azure data factory provides a convenient way to build data pipelines quickly and efficiently using drag and drop. Combining data factory with Azure functions App gets the data pipeline to the next level. The data engineer can retrieve data from endpoints requiring sophisticated authentication methods and manipulate it before dumping it into its final destination. In this article, we walk you through the steps we follow to build an end-to-end pipeline for retrieving data from secure endpoints. The pipeline consists of several Azure management services connected following the best practices. Performance, security, and quality are the keywords that motivate our choices. We start by explaining how we use the Azure functions App to call the API. Next, we demonstrate how we integrate Key Vault into the Azure function to secure sensitive data. In the end, we put all the components together to build the entire pipeline.

What is a Password-Digest?

Password digest-based authentication methods are used to secure web services accessed through APIs when serving sensitive data. Passwords digest using WSSE authentication over SSL is one method we employ in one of our projects to keep the data endpoints secure. The user generates an X-WSSE header token by combining some secretes and keys before hashing it and passing it to the final request. The hashed token is called the Password-Digest. Using the date as a part of the hashed token enforces recomputing the hash for every request. Generally, the digest is valid for a given period before it expires.

Generating a new password digest on-demand with Azure function

Azure functions App provides developers a balance between flexibility and convenience. On one side, it is possible to deploy complete modules and packages in many supported programming languages using Azure functions. On another side, the Azure functions app eliminates the overhead of choosing, spinning up, scaling up, and down servers. Yet, it stays an affordable choice compared to many other services.

Azure functions App is an appropriate choice for many API services that require authentication over SSL to keep the data secure. In our case, creating the password-digest requires joining several components to generate a custom header for every request. In a nutshell, we will call the azure function app and return the password digest. We use the latter as an additional header to the request.

We create the password digest by joining:

  1. Nonce: A random value ensuring that the request is unique
  2. Timestamp: The current timestamp in ISO8601 format
  3. Secret: The API user credential secret

We hash the combination following the Secure Hash Algorithm 1 (sha1). In the following code snippet, we demonstrate how to implement the password digest using a class and a data model to pass the data to the class. We use Python for this example.

The function

For the Azure Function we will use the PasswordDigest class above to get the password-digest and we will input the other variable like username, nonce and timestamp directly in the header.

The function to create the header is called in the main function listening to an HTTP request. For that we use the Azure function library. The function will then return our header and will be able to be available further in our Azure Data Factory pipeline.

Finally we need to configure our function to accept a get method and return our header. We can do that in the .function file.

Secure credentials with KeyVault

With Azure’s Key Vault service, secrets that need tight access control can be stored. Some examples can be passwords, certificates, or API keys. In the case of password digest, this can be used to store the user and password information.

Using python’s library for Azure’s Key-Vault
In order to use Azure’s Key-Vault secrets from the Python app, an azure-keyvault-secrets is needed to access the secrets and an azure-identity in order to authenticate. In addition to this, the right access permissions in the Key-Vault service in Azure must be provided.
Below is a small example showing how to access an Azure Key Vault secret:

pip install azure-identity
pip install azure-keyvault-secrets

Consider as well that it is necessary to assign the correct access policies to the application consuming the secrets. It is easy to do this by going into the Key Vault menu > Access Policies > +Add Access Policy.

In this case, since we are using DefaultAzureCredential() the access policy will be linked to the service principal, the service principal could be of two types, an application or a managed identity. The application type is the local representation of an application instance, however, this could be problematic when using something like terraform that routinely destroys and re-creates applications. For these cases using Managed Identity would be a better solution since it provides an identity that isn't dependant on the app itself.

How to handle missing secrets
In some circumstances, it is necessary to handle missing secrets with a try-catch exception. For us, this use-case presented itself when we implemented integration tests that consumed secrets that were only available in certain environments, so it was necessary to handle the missing secrets with a try-catch exception like so:

Linking OS Variables to Secrets

It is also possible to link the secret directly in the function App configurations in order to avoid having to access the secret vault directly. It is done by creating a configuration with the following structure:

@Microsoft.KeyVault({referenceString})

referenceString should be replaced by one of the following

@Microsoft.KeyVault(SecretUri=https://myvault.vault.azure.net/secrets/mysecret/) # The secret identifier can be obtained by going to the key vault > Secret > Current Version > Secret Identifier

or

@Microsoft.KeyVault(VaultName=myvault;SecretName=mysecret)

Using password digest for authentication in the Azure data factory

After defining the components of the pipeline, it is time to connect the dots. We are going to use the Azure data factory to connect those components and establish an end-to-end authentication pipeline using password digest. Azure data factory is a convenient way to create ETL without having to entirely script the pipeline. It is designed to ingest all types of data from many sources thanks to its connector. Moreover, it also allows users to integrate the Azure function at any moment in the pipeline. This integration with Azure Function makes it very flexible and powerful. For our API that requires a header that contains the password digest, we will call our function every time we send a request. To integrate the function, simply drag and drop the “Azure Function activity” to the pipeline. Then set up the Azure function link service, provide some info like name, method, header, and body.

In the Azure function Setting, point to the right function name and use the right method, GET is appropriate in this scenario. The function will return the password-digest each time with trigger it via http Get method.

Adding the additional header from the Azure function

Once the function is set up we will use Copy Data activity in our pipeline to test our authentication by calling the API and retrieving a simple 200 OK HTTP status code meaning that the request was successful.

First, we need to Create a source and a Sink dataset. The source is a Rest Link Service calling the API and the Sink is an Azure Storage link service in charge or coping the response from the API in Azure storage. As we know the API will authorise the connection only if we input an additional header containing our password-digest, we need to get it from our function. Let’s do it !

For connecting the Azure function to the Copy Data simply connect them in the pipeline, an arrow going from the function to the Copy Data activity is now appearing. Now that they are connected, Copy data activity can access the password digest returned by the function. We need to add it to an additional header with “WSSE” as name. As a value we want to input the output from our function. Use the interface and click on “Add dynamic content” and select it or type the following.

@activity('function_name').output

This is dynamic content and it’ll be updated and send each time we use the Copy data Activity linked to the function.

The following schemas summarised the data flow:

If you enjoy working on large-scale projects with global impact and if you like a real challenge, feel free to reach out to us at xgeeks! We are growing our team and you might be the next one to join this group of talented people 😉

Check out our social media channels if you want to get a sneak peek of life at xgeeks! See you soon!

--

--