Submit Databricks Job using REST API — launched by runs submit API

Chetanya-Patil

Published in

CT Engineering

3 min readNov 21, 2023

# Create and trigger a one-time run

This endpoint allows you to submit a workload directly to databricks cluster.

Sample request body: job_config

job_config = {
    "tasks": [
        {
            "task_key": "Dummy Job",
            "description": "Testing job using rest api endpoints",
            "timeout_seconds": 86400,
            "existing_cluster_id": "0712-070515-nwwdcq2g",
            "new_cluster":{
                "num_workers": 10,
                "autoscale":{
                    "min_workers":2,
                    "max_workers":10
                },
                "cluster_name": "dummy_cluster",
                "spark_version": "3.3.x-scala2.12",
                "spark_conf":{

                },
                "init_scripts":{

                },
                "instance_pool_id": "The optional ID of the instance pool to which the cluster belongs."
            },
            "notebook_task":{
                "notebook_path": "/Users/user.name@databricks.com/Match",
                "base_parameters":{
                    "name": "jhon",
                    "age": "35"
                }
            },
            "spark_jar_task":{
                "main_class_name": "",
                "parameters":["parameter1","parameter2"]
            },
            "spark_python_task":{
                "python_file": "",
                "parameters":[]
            },
            "spark_submit_task":{},
            "libraries":[
                {
                    "jar":"dbfs:/mnt/databricks/library.jar"
                }
            ]
        }
    ],
    "run_name":"An optional name for the job. The default value is Untitled",  
    "timeout_seconds":86400,
    "email_notification":{
        "on_start":["user.name@databricks.com"],
        "on_success":["user.name@databricks.com"],
        "on_failure":["user.name@databricks.com"]
    }
}

Responses we get back from it:

After successful submission of job we get identifier for the newly submitted run that is run_id

This method might return following HTTP codes 400, 401, 403, 404, 429, 500

Error responses have following format:

{
"error_code": "Error code",
"message": "Human-readable error message that describes the cause of the error."
}

To know possible error codes:
https://docs.databricks.com/api/azure/workspace/jobs/submit

So let’s start understanding how we can submit job to databricks cluster using rest api which we major call it as launched by runs submit API

Practical Use cases:

When we want to submit job from a airflow code like using python_callable operator.
https://chetanyapatil.medium.com/submit-databricks-job-using-rest-api-in-airflow-553a2aba0deb

We can even use it in our in-house developed framework to submit those jobs.
..

Let’s perform some practicals here:

We will do it using requests library from python. Here which json payload we are passing, contains different tasks like notebook_task, spark_jar_task, spark_python_task and spark_submit_task.

import requests

token = "dapicb8813**f945ef23a"
host = "https://adb-1783**186047.7.azuredatabricks.net"
api_endpoint = "/api/2.1/jobs/runs/submit"
submit_url = f"{host}{api_endpoint}"

submit_job = requests.post(submit_url,
                           headers={"Authorization": f"Bearer {token}"},
                           json=job_config)

run_id = submit_job.json()["run_id"]
print(run_id)

As per the requirement we have to modify that json payload like suppose if we want to submit job of spark_jar_task, we can only add spark_jar_task in json payload with it’s required information in object form. [Use case 1]
Example: spark_jar_task

job_config = {
    "tasks": [
        {
            "task_key": "Dummy Job",
            "description": "Testing job using rest api endpoints",
            "timeout_seconds": 86400,
            "existing_cluster_id": "0712-070515-nwwdcq2g",
            "new_cluster":{
                "num_workers": 10,
                "autoscale":{
                    "min_workers":2,
                    "max_workers":10
                },
                "cluster_name": "dummy_cluster",
                "spark_version": "3.3.x-scala2.12",
                "spark_conf":{

                },
                "init_scripts":{

                },
                "instance_pool_id": "The optional ID of the instance pool to which the cluster belongs."
            },
            "spark_jar_task":{
                "main_class_name": "",
                "parameters":["parameter1","parameter2"]
            },
            "libraries":[
                {
                    "jar":"dbfs:/mnt/databricks/library.jar"
                }
            ]
        }
    ],
    "run_name":"An optional name for the job. The default value is Untitled",  
    "timeout_seconds":86400,
    "email_notification":{
        "on_start":["user.name@databricks.com"],
        "on_success":["user.name@databricks.com"],
        "on_failure":["user.name@databricks.com"]
    }
}

Video format Explanation:
<Add video link here>

Thanks for Reading!

If you like my work and want to support me…

The BEST way to support me is by following me on Medium.
I share content about #dataengineering. Let’s connect on LinkedIn.
Feel free to give claps so I know how helpful this post was for you.

#databricks-api #rest api #dataengineering #apachespark

Submit Databricks Job using REST API — launched by runs submit API

Thanks for Reading!

Written by Chetanya-Patil