Deploying managed Airflow on Azure Data Factory using REST APIs

How to deploy the Azure Managed Airflow service using REST API with Postman. No more clicking in the portal and no more manual setup of connections in the admin panel.

DataFairy
Towards Data Engineering
5 min readOct 27, 2023

--

What is Managed Airflow on Azure?

Beginning of 2023 Microsoft announced that they were offering Apache Airflow as a managed service in Data Factory. It started out as a preview version and has been in GA for a few months.

Deploying Managed Airflow

To deploy Managed Airflow in Data Factory you go to the “Manage” tab in your Data Factory UI, choose create new Airflow instance, fill out the configurations and wait.

If everything goes well the instance will be available after a few minutes.

That’s all pretty straight forward. The only annoying thing is that you have to do these steps manually. As far as I have seen there is no Bicep template that includes Airflow in Data Factory.

All the configurations and connections in Airflow have to be done manually. If an instance breaks (this happens more often than you might think), due to Python dependency issues for example, you will have to do it all over again.

Managed Airflow using REST API

The alternative to manual configuration on every deployment or re-deployment is to use the REST API. There is documentation on how to deploy an Airflow instance: REST APIs for the Managed Airflow integrated runtime — Azure Data Factory | Microsoft Learn

In the following I want to describe a step-for-step guide on how to setup a Managed Airflow instance using Postman and even add connections such as Azure storage account connections.

How to call a REST API to talk to Azure resources

Prerequisites:

  • Azure subscription and resources
  • Service principal with Contributor rights on the subscription/resource group level
  • Client id and secret of the service principal
  • Postman account/login

To get started with the REST API in Azure I followed the link below. I did have to login to Postman to be able to use the collections and the javascript code.

Azure REST APIs with Postman (2021) | Jon Gallant

To get information on a specific resource group you can use the following GET request url:

https://management.azure.com/subscriptions/{{subscriptionId}}/resourcegroups/{{resourceGroupName}}?api-version=2021-04-01

This should result in something like this:

{
"id": "/subscriptions/xxx/resourceGroups/xxx",
"name": "xxx",
"type": "Microsoft.Resources/resourceGroups",
"location": "westeurope",
"tags": {},
"properties": {
"provisioningState": "Succeeded"
}
}

If you managed to get this running you should now have a valid bearer token to run other requests.

REST API for Managed Airflow

REST APIs for the Managed Airflow integrated runtime — Azure Data Factory | Microsoft Learn

The request for creating an Airflow environment looks like that in Postman:

POST https://management.azure.com/subscriptions/{{subscriptionId}}/resourcegroups/{{resourceGroupName}}/providers/Microsoft.DataFactory/factories/{{datafactoryName}}/integrationruntimes/{{airflowEnvName}}?api-version=2018-06-01

Variables:

  • subscriptionId
  • resourceGroupName
  • datafactoryName
  • airflowName

The request body to create an instance with GitSync looks like the following:

{
"name": "sample-git",
"properties": {
"type": "Airflow",
"typeProperties": {
"computeProperties": {
"location": "East US",
"computeSize": "Large",
"extraNodes": 0
},
"airflowProperties": {
"airflowVersion": "2.6.3",
"airflowEnvironmentVariables": {
"AIRFLOW__TEST__TEST": "test"
},
"airflowRequirements": [
"apache-airflow-providers-microsoft-azure",
"azure-datalake-store"
],
"enableAADIntegration": true,
"userName": null,
"password": null,
"airflowEntityReferences": [],
"gitSyncProperties": {
"gitServiceType": "ADO",
"gitCredentialType": "PAT",
"repo": "https://Org@dev.azure.com/Org/Project/_git/Repo",
"branch": "branch_name",
"username": "Org"
"Credential": "PAT_value"
}
}
}
}
}

Don’t forget to add the new variables to your collection and save in Postman.

When you run the command above you should get the following output:

{
"id": "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.DataFactory/factories/xxx/integrationruntimes/Airflow-with-API",
"name": "Airflow-with-API",
"type": "Microsoft.DataFactory/factories/integrationruntimes",
"properties": {
"type": "Airflow",
"state": "Initial",
"typeProperties": {
"computeProperties": {
"location": "West Europe",
"computeSize": "Large",
"extraNodes": 0
},
"airflowProperties": {
"airflowVersion": "2.6.3",
"pythonVersion": "3.8",
"airflowEnvironmentVariables": {
"AIRFLOW__TEST__TEST": "test"
},
"airflowWebUrl": "",
"airflowRequirements": [
"apache-airflow-providers-microsoft-azure",
"azure-datalake-store"
],
"airflowEntityReferences": [],
"enableAADIntegration": true,
"enableTriggerers": false,
"gitSyncProperties": {
"gitServiceType": "ADO",
"gitCredentialType": "PAT",
"repo": "https://Org@dev.azure.com/Org/Project/_git/Repo",
"branch": "branch_name",
"username": "Org",
"encryptedCredential": "xxx"
}
}
}
},
"etag": "xxx"
}

Here the Python version is set to 3.8 which appears to be hardcoded and not changeable. Hopefully there will be more options in the future.

With this setup you should be able to see a new Airflow instance starting in your Azure Data Factory.

Create Airflow connections using REST API

Prerequisites:

  • Airflow instance with Basic Auth enabled
  • Postman

To get started go to your Airflow documentation and click on REST API:

You should be able to find connections and the respective API calls you can make.

REST API call:

POST https://xxx.westeurope.airflow.svc.datafactory.azure.com/api/v1/connections

In Postman it should looks like this:

You want to disable the pre-request script stored in the collection for this request.

The result should be found in connections:

When I tried using the list connections API call as well but ran into a “Bad Request 404 error. Request body must not be empty”. This might be a Postman issue as well, but creating new connections worked for me.

Managed Airflow connections for instances with Azure Active Directory instead of Basic Auth

In this case you won’t be able to work with REST APIs due to the fact that your browser will prompt another authentication step when login in to Airflow from Data Factory. The best way here is to store connections in Key Vault.

Enable Azure Key Vault for airflow — Azure Data Factory | Microsoft Learn

This is by far the best way to store your connections. They won’t be lost if your instance happens to crash or is deleted.

If you found this article useful, please follow me.

--

--

DataFairy
Towards Data Engineering

Senior Data Engineer, Azure Warrior, PhD in Theoretical Physics, The Netherlands. I write about Data Engineering, Machine Learning and DevOps on Azure.