Converting Databricks notebook into a job and getting the notebook output from ADF

Dhanasri Mahendramani
BI3 Technologies
Published in
4 min readJul 11, 2022

This blog demonstrates how to convert a notebook into a job, run the job at hand from ADF, and retrieve the notebook output via ADF.

The blog will go over the three following functionalities in detail,

  • Converting a notebook into a Job in Databricks
  • Executing Databricks Jobs using Azure Data Factory
  • Obtaining Databricks Job Output from ADF

Converting a notebook into a Job in Databricks

Step 1:Create a databricks notebook with the necessary logic.

Step 2:On the left side of Azure Databricks, select the workflows tab

Step 3:In the workflows page, click ‘Create Job’.

Step 4:In the appeared dialogue box, Enter the necessary details like Task name, path , etc and click create and the corresponding job will be created

Executing Databricks Jobs using Azure Data Factory

Step 1:To execute a databricks job, go to ADF and create a new pipeline with three variables, as shown below.

Step 2:The next step is to add one web activity and configure its values as shown below,

URL: (Data bricks URL /API/2.0/jobs/run-now)Method: POSTHeaders: Authorization:Value: Bearer <Databricks’s access token >Body: If any parameters are already specified in the databricks notebook, then add their values in the body, as shown below.
Authentication: None

Step 3:Add an until activity that will be called when the web activity is successful.

Step 4:Within the until activity, add more web activities for collecting API status and getting job output. Then , store the output in a variable using the set variable activity, as shown below.

URL: <Data bricks URL>API/2.0/jobs/runs/get-output?run_id=@{activity(‘<Previous Activity>’).output.run_id}Method: GETHeaders: Authorization:Value: Bearer <Databricks’s access token >

Until activity condition:

@Or(equals(variables(‘life_cycle_state’),’TERMINATED’), equals(variables(‘life_cycle_state’),’INTERNAL_ERROR’))

Step 5:After the web activity, add one set variable activity with the variable name life_cycle_state and the variable value as shown below.

Variable Value:

@activity (‘CHECK JOB RUN STATUS’).output.metadata.state.life_cycle_state

Obtaining Databricks Job Output from ADF

Step 1 : After the set variable activity, add another set variable activity to the Store job output with the variable name Job_Output and the variable value as shown below.

Variable Value:

@json(json(activity(‘CHECK JOB RUN STATUS’).output.notebook_output.result).data)

Step 2: Now, run the entire pipeline. When the pipeline is run, the outputs of all activities will look like the below.

The final output of web activity (CHECK JOB RUN STATUS):

The final output of set variables(SET STATUS,JOB OUTPUT):

Job output will be stored in the variables.

Finally, by using the above instructions, a user may acquire a databricks notebook output in ADF, which would usually be generated by manually executing the notebook in databricks as seen in the above image.

About Us

Bi3 has been recognized for being one of the fastest-growing companies in Australia. Our team has delivered substantial and complex projects for some of the largest organizations around the globe and we’re quickly building a brand that is well known for superior delivery.

Website: https://bi3technologies.com/

Follow us on,
LinkedIn : https://www.linkedin.com/company/bi3technologies
Instagram :
https://www.instagram.com/bi3technologies/
Twitter :
https://twitter.com/Bi3Technologies

--

--