Create DBT Job in Databricks Workflow Integrated with Azure DevOps

Published in

BI3 Technologies

6 min readMar 22, 2024

Introduction:

This guide illustrates the process of creating and scheduling a DBT job that integrates with Azure DevOps within the Databricks workflow. This will cover both scenarios:

a) When the Databricks workspace and Azure DevOps are housed in the same tenant

b) When they reside in separate tenants.

Step 1:

To set up Azure DevOps repository. Visit the provided URL (https://aex.dev.azure.com/) to log in. If there are no existing projects in the organization, proceed to create one. After entering the project name, click on “Create Project”.

2. Choose the created project and navigate to repos.

3. Select the repository name and opt to create a new repository.

4. Enter the repository name and unselect the option “Add a README”.

5. Copy the command that show in repository: git remote add origin <url>

Step 2:

Navigate to the DBT project folder (local machine) in the cmd and run the following git command.

git init

2. Use the copied remote URL to log in to the Azure DevOps repo.

Example:

git remote add origin https://test@dev.azure.com/test/DataAnalytics/_git/repo_name

git add .

git commit -m "initial commit"

git push origin master

3. Choose the linked account if pop up appears.

4. Files will be loaded into the repository.

Note: Verify that the code executed properly in the CLI.

Step 3:

Note: If the tenancy of Databricks and Azure DevOps under Microsoft Entra ID differs, proceed with steps 6 and 7 (can skip step 3) and continue with step 4.

Access Databricks, log in, and select the profile located in the top-right corner.

2. Choose the user settings.

3. Make sure to Choose Azure DevOps Services (Azure Active Directory) as the git provider in the linked accounts section for integration and proceed by clicking on the link option.

4. The connection between Databricks and Azure DevOps Services will be established via Azure Active Directory. This will display the associated status below.

Step 4:

To create a Databricks DBT job, access Databricks, go to the Workflow section, and select “Create Job.”

2. Enter the task name (trigger_dbt) and choose Type as DBT.

3. Modify the Source section and input the repository URL, select Git provider as AzureDevOps Services and enter branch name.

4. Enter the DBT commands need to be executed in a particular run order. Select the cluster, schema, catalog, and SQL Warehouse settings for the DBT project. Choose which dependent libraries needs to be used(or leave as default if not needed).

5. Select “create” to generate and establish the flow accordingly.

6. The outcomes of the run will be displayed in the logs.

Step 5:

Click on “Add trigger” within the Schedule & Triggers section to schedule the job.

2. Choose “Schedule” as the Trigger type, set the desired Schedule time, and specify the time format accordingly. And click save.

3. If Databricks and Azure DevOps belong to distinct Microsoft Azure Active Directory tenancies, proceed with the steps outlined below.

Step 6:

Visit the provided URL (https://aex.dev.azure.com/) to log in and generate an access token. In the upper right corner, click on the user setting icon.

2. Then select personal access token and click New Token.

3. Type the name of the access token (dbt-access-token). Set the token’s expiration date. Choose Full access for the scopes and click Create.

4. Ensure to copy the code, as it will not be displayed again.

Step 7:

2. Select the user settings.

3. In linked accounts under git integration, Select git provider as Azure DevOps Services (Personal access token). Enter Username or email address of Azure DevOps Account. Paste the generated token that provides full access and proceed by clicking the “Save” button.

4. This action will establish the connection between Databricks and Azure DevOps Account. The linked account will be shown below.

5.At this point, we can trigger and schedule the flow as needed with different Azure DevOps Entra ID.

Conclusion:

Setting up the DBT task in Databricks for daily triggers is crucial. Opting for a CLI cluster proves to be a more economical choice compared to an all-purpose computing cluster. The DBT run begins with an automated SQL warehouse activation where need not to spin up the SQL warehouse manually. Additionally, the workflow can be initiated in ADF using an API.

About Us:

Bi3 has been recognized for being one of the fastest-growing companies in Australia. Our team has delivered substantial and complex projects for some of the largest organizations around the globe and we’re quickly building a brand that is well-known for superior delivery.

Website: https://bi3technologies.com/
Follow us on,
LinkedIn: https://www.linkedin.com/company/bi3technologies
Instagram: https://www.instagram.com/bi3technologies/
Twitter: https://twitter.com/Bi3Technologies