Migrating Databricks Tasks from Prefect 1 to Prefect 2

From Task Library to Databricks Collection

Andrew Huang
The Prefect Blog
4 min readAug 25, 2022

--

Prefect blue duck on bricks

Are you orchestrating Databricks jobs in your Prefect 1.0 flows? Excited about the Prefect 2 Databricks Collection? Worried about migrating? Don’t fret, this guide will help you migrate seamlessly!

side by side code
An overview of the changes needed — it’s mostly just imports!

If you weren’t using the Databricks tasks from the Prefect 1.0 Task Library, this guide can help you get a sense of how to use the Databricks Collection, too!

Installation

To install the prefect-databricks Collection:

And that’s it!

Databricks Notebook

To set the stage, let’s say you have a notebook on Databricks, named example.ipynb:

  • Line 1 accepts one base parameter, namely name.
  • Line 2 uses name to format a message.
  • Line 3 prints the formatted message.
An example notebook saved as example.ipynb within Databricks

Prefect 1 Flow

And this is your Prefect 1 code to execute the Databricks notebook.

  • Section 1 imports the required packages.
  • Section 2 fetches the secret from your local system.
  • Section 3 is a task that creates the required job task settings.
  • Section 4 is a flow that accepts two parameters, notebook_path and base_parameters, and runs both the create_job_task_settings and DatabricksSubmitMultitaskRun tasks.
  • Section 5 runs the flow with desired values for the parameters.
An example of a Prefect 1 flow using Databricks tasks

Databricks Output

Here’s what the output looks like on Databricks!

databricks ui screenshot

Prefect 2 Migration

Let’s walk through each section, step by step, to see how easy it is to migrate to Prefect 2.

Section 1 — Package Imports

Before:

After:

  • Replace Flow with flow - Because Prefect 2 flows are simply decorators now!
  • Remove Parameter - Inputs to your flow function are automatically treated as parameters of your flow in Prefect 2.
  • Replace from prefect.client.secrets import Secret with from prefect_databricks.credentials import DatabricksCredentials - Secrets are no longer just generic, unstructured secrets, they are now structured Blocks that cater to the specific service’s authentication requirements.
  • Replace from prefect.tasks.databricks import DatabricksSubmitMultitaskRun with from prefect_databricks.flows import jobs_runs_submit_and_wait_for_completion - Tasks from the Prefect 1 Task Library are now broken out into their own repositories as Prefect Collections.
  • Replace prefect.tasks.databricks.models with from prefect_databricks.models.jobs - The models are now subset more specifically because in the future, there will be many more models!

Section 2 — Service Secrets

Before:

After:

  • Replace databricks_conn = Secret("DATABRICKS_CONNECTION_STRING").get() with databricks_credentials = DatabricksCredentials.load("my-block") - As mentioned above, secrets are no longer just generic, unstructured secrets, they are now structured blocks that cater to the specific service’s authentication requirements.

Section 3 — Job Task Settings

Before & After:

  • Surprise! No replacement needed — while we did make some breaking changes for the sake of improved maintainability and extensibility, we didn’t make changes without good reason!

Section 4 — Flow Tasks

Before:

After:

  • Replace with Flow("Databricks Flow") as flow: with @flow(name="Databricks Flow") - Flows are now simply decorators of the Python functions that you know and love!
  • Add databricks_credentials = DatabricksCredentials.load("my-block") - Although the block can be loaded outside the flow, it’s best practice to load it within!
  • Replace the two lines containing Parameter with def jobs_runs_submit_flow(notebook_path, **base_parameters): - As stated before, inputs to your flow function are automatically treated as parameters of your flow in Prefect 2; we really want it to feel like writing native Python!
  • Keep the create_job_task_settings line - We didn’t change it!
  • Replace the DatabricksSubmitMultitaskRun lines with jobs_runs_submit_and_wait_for_completion with updated keyword arguments - Unlike tasks from Prefect 1 Task Library, tasks and flows from Prefect 2 collections are functions!

Section 5 — Execute Flow

Before:

After:

  • Replace flow.run with jobs_runs_submit_flow - We’re no longer using a Flow class method, we’re just calling a decorated function!

That’s all! Here’s the new Prefect 2 Databricks script!

Prefect 2 Flow

An example of a Prefect 2 flow using Databricks tasks

Congrats on the smooth migration! If you enjoy using Prefect 2, drop a star on GitHub, and feel free to join our Slack or Discourse community!

Happy engineering!

blue duck on a brick wall

--

--