Submit Databricks Job using REST API — launched by runs submit API

Chetanya-Patil
CT Engineering
Published in
3 min readNov 21, 2023

# Create and trigger a one-time run

This endpoint allows you to submit a workload directly to databricks cluster.

Sample request body: job_config

job_config = {
"tasks": [
{
"task_key": "Dummy Job",
"description": "Testing job using rest api endpoints",
"timeout_seconds": 86400,
"existing_cluster_id": "0712-070515-nwwdcq2g",
"new_cluster":{
"num_workers": 10,
"autoscale":{
"min_workers":2,
"max_workers":10
},
"cluster_name": "dummy_cluster",
"spark_version": "3.3.x-scala2.12",
"spark_conf":{

},
"init_scripts":{

},
"instance_pool_id": "The optional ID of the instance pool to which the cluster belongs."
},
"notebook_task":{
"notebook_path": "/Users/user.name@databricks.com/Match",
"base_parameters":{
"name": "jhon",
"age": "35"
}
},
"spark_jar_task":{
"main_class_name": "",
"parameters":["parameter1","parameter2"]
},
"spark_python_task":{
"python_file": "",
"parameters":[]
},
"spark_submit_task":{},
"libraries":[
{
"jar":"dbfs:/mnt/databricks/library.jar"
}
]
}
],
"run_name":"An optional name for the job. The default value is Untitled",
"timeout_seconds":86400,
"email_notification":{
"on_start":["user.name@databricks.com"],
"on_success":["user.name@databricks.com"],
"on_failure":["user.name@databricks.com"]
}
}

Responses we get back from it:

After successful submission of job we get identifier for the newly submitted run that is run_id

This method might return following HTTP codes 400, 401, 403, 404, 429, 500

Error responses have following format:

{
"error_code": "Error code",
"message": "Human-readable error message that describes the cause of the error."
}

To know possible error codes:
https://docs.databricks.com/api/azure/workspace/jobs/submit

So let’s start understanding how we can submit job to databricks cluster using rest api which we major call it as launched by runs submit API

Practical Use cases:

When we want to submit job from a airflow code like using python_callable operator.
https://chetanyapatil.medium.com/submit-databricks-job-using-rest-api-in-airflow-553a2aba0deb

  1. We can even use it in our in-house developed framework to submit those jobs.
  2. ..

Let’s perform some practicals here:

We will do it using requests library from python. Here which json payload we are passing, contains different tasks like notebook_task, spark_jar_task, spark_python_task and spark_submit_task.

import requests

token = "dapicb8813**f945ef23a"
host = "https://adb-1783**186047.7.azuredatabricks.net"
api_endpoint = "/api/2.1/jobs/runs/submit"
submit_url = f"{host}{api_endpoint}"

submit_job = requests.post(submit_url,
headers={"Authorization": f"Bearer {token}"},
json=job_config)

run_id = submit_job.json()["run_id"]
print(run_id)

As per the requirement we have to modify that json payload like suppose if we want to submit job of spark_jar_task, we can only add spark_jar_task in json payload with it’s required information in object form. [Use case 1]
Example: spark_jar_task

job_config = {
"tasks": [
{
"task_key": "Dummy Job",
"description": "Testing job using rest api endpoints",
"timeout_seconds": 86400,
"existing_cluster_id": "0712-070515-nwwdcq2g",
"new_cluster":{
"num_workers": 10,
"autoscale":{
"min_workers":2,
"max_workers":10
},
"cluster_name": "dummy_cluster",
"spark_version": "3.3.x-scala2.12",
"spark_conf":{

},
"init_scripts":{

},
"instance_pool_id": "The optional ID of the instance pool to which the cluster belongs."
},
"spark_jar_task":{
"main_class_name": "",
"parameters":["parameter1","parameter2"]
},
"libraries":[
{
"jar":"dbfs:/mnt/databricks/library.jar"
}
]
}
],
"run_name":"An optional name for the job. The default value is Untitled",
"timeout_seconds":86400,
"email_notification":{
"on_start":["user.name@databricks.com"],
"on_success":["user.name@databricks.com"],
"on_failure":["user.name@databricks.com"]
}
}

Video format Explanation:
<Add video link here>

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium.
  2. I share content about #dataengineering. Let’s connect on LinkedIn.
  3. Feel free to give claps so I know how helpful this post was for you.

#databricks-api #rest api #dataengineering #apachespark

--

--