Deploying managed Airflow on Azure Data Factory using REST APIs
How to deploy the Azure Managed Airflow service using REST API with Postman. No more clicking in the portal and no more manual setup of connections in the admin panel.
What is Managed Airflow on Azure?
Beginning of 2023 Microsoft announced that they were offering Apache Airflow as a managed service in Data Factory. It started out as a preview version and has been in GA for a few months.
Deploying Managed Airflow
To deploy Managed Airflow in Data Factory you go to the “Manage” tab in your Data Factory UI, choose create new Airflow instance, fill out the configurations and wait.
If everything goes well the instance will be available after a few minutes.
That’s all pretty straight forward. The only annoying thing is that you have to do these steps manually. As far as I have seen there is no Bicep template that includes Airflow in Data Factory.
All the configurations and connections in Airflow have to be done manually. If an instance breaks (this happens more often than you might think), due to Python dependency issues for example, you will have to do it all over again.
Managed Airflow using REST API
The alternative to manual configuration on every deployment or re-deployment is to use the REST API. There is documentation on how to deploy an Airflow instance: REST APIs for the Managed Airflow integrated runtime — Azure Data Factory | Microsoft Learn
In the following I want to describe a step-for-step guide on how to setup a Managed Airflow instance using Postman and even add connections such as Azure storage account connections.
How to call a REST API to talk to Azure resources
Prerequisites:
- Azure subscription and resources
- Service principal with Contributor rights on the subscription/resource group level
- Client id and secret of the service principal
- Postman account/login
To get started with the REST API in Azure I followed the link below. I did have to login to Postman to be able to use the collections and the javascript code.
Azure REST APIs with Postman (2021) | Jon Gallant
To get information on a specific resource group you can use the following GET request url:
https://management.azure.com/subscriptions/{{subscriptionId}}/resourcegroups/{{resourceGroupName}}?api-version=2021-04-01
This should result in something like this:
{
"id": "/subscriptions/xxx/resourceGroups/xxx",
"name": "xxx",
"type": "Microsoft.Resources/resourceGroups",
"location": "westeurope",
"tags": {},
"properties": {
"provisioningState": "Succeeded"
}
}
If you managed to get this running you should now have a valid bearer token to run other requests.
REST API for Managed Airflow
REST APIs for the Managed Airflow integrated runtime — Azure Data Factory | Microsoft Learn
The request for creating an Airflow environment looks like that in Postman:
POST https://management.azure.com/subscriptions/{{subscriptionId}}/resourcegroups/{{resourceGroupName}}/providers/Microsoft.DataFactory/factories/{{datafactoryName}}/integrationruntimes/{{airflowEnvName}}?api-version=2018-06-01
Variables:
- subscriptionId
- resourceGroupName
- datafactoryName
- airflowName
The request body to create an instance with GitSync looks like the following:
{
"name": "sample-git",
"properties": {
"type": "Airflow",
"typeProperties": {
"computeProperties": {
"location": "East US",
"computeSize": "Large",
"extraNodes": 0
},
"airflowProperties": {
"airflowVersion": "2.6.3",
"airflowEnvironmentVariables": {
"AIRFLOW__TEST__TEST": "test"
},
"airflowRequirements": [
"apache-airflow-providers-microsoft-azure",
"azure-datalake-store"
],
"enableAADIntegration": true,
"userName": null,
"password": null,
"airflowEntityReferences": [],
"gitSyncProperties": {
"gitServiceType": "ADO",
"gitCredentialType": "PAT",
"repo": "https://Org@dev.azure.com/Org/Project/_git/Repo",
"branch": "branch_name",
"username": "Org"
"Credential": "PAT_value"
}
}
}
}
}
Don’t forget to add the new variables to your collection and save in Postman.
When you run the command above you should get the following output:
{
"id": "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.DataFactory/factories/xxx/integrationruntimes/Airflow-with-API",
"name": "Airflow-with-API",
"type": "Microsoft.DataFactory/factories/integrationruntimes",
"properties": {
"type": "Airflow",
"state": "Initial",
"typeProperties": {
"computeProperties": {
"location": "West Europe",
"computeSize": "Large",
"extraNodes": 0
},
"airflowProperties": {
"airflowVersion": "2.6.3",
"pythonVersion": "3.8",
"airflowEnvironmentVariables": {
"AIRFLOW__TEST__TEST": "test"
},
"airflowWebUrl": "",
"airflowRequirements": [
"apache-airflow-providers-microsoft-azure",
"azure-datalake-store"
],
"airflowEntityReferences": [],
"enableAADIntegration": true,
"enableTriggerers": false,
"gitSyncProperties": {
"gitServiceType": "ADO",
"gitCredentialType": "PAT",
"repo": "https://Org@dev.azure.com/Org/Project/_git/Repo",
"branch": "branch_name",
"username": "Org",
"encryptedCredential": "xxx"
}
}
}
},
"etag": "xxx"
}
Here the Python version is set to 3.8 which appears to be hardcoded and not changeable. Hopefully there will be more options in the future.
With this setup you should be able to see a new Airflow instance starting in your Azure Data Factory.
Create Airflow connections using REST API
Prerequisites:
- Airflow instance with Basic Auth enabled
- Postman
To get started go to your Airflow documentation and click on REST API:
You should be able to find connections and the respective API calls you can make.
REST API call:
POST https://xxx.westeurope.airflow.svc.datafactory.azure.com/api/v1/connections
In Postman it should looks like this:
You want to disable the pre-request script stored in the collection for this request.
The result should be found in connections:
When I tried using the list connections API call as well but ran into a “Bad Request 404 error. Request body must not be empty”. This might be a Postman issue as well, but creating new connections worked for me.
Managed Airflow connections for instances with Azure Active Directory instead of Basic Auth
In this case you won’t be able to work with REST APIs due to the fact that your browser will prompt another authentication step when login in to Airflow from Data Factory. The best way here is to store connections in Key Vault.
Enable Azure Key Vault for airflow — Azure Data Factory | Microsoft Learn
This is by far the best way to store your connections. They won’t be lost if your instance happens to crash or is deleted.
If you found this article useful, please follow me.