Migrating Databricks Tasks from Prefect 1 to Prefect 2
From Task Library to Databricks Collection
Are you orchestrating Databricks jobs in your Prefect 1.0 flows? Excited about the Prefect 2 Databricks Collection? Worried about migrating? Don’t fret, this guide will help you migrate seamlessly!
If you weren’t using the Databricks tasks from the Prefect 1.0 Task Library, this guide can help you get a sense of how to use the Databricks Collection, too!
Installation
To install the prefect-databricks Collection:
pip install prefect-databricks
And that’s it!
Databricks Notebook
To set the stage, let’s say you have a notebook on Databricks, named example.ipynb:
- Line 1 accepts one base parameter, namely
name
. - Line 2 uses
name
to format a message. - Line 3 prints the formatted message.
Prefect 1 Flow
And this is your Prefect 1 code to execute the Databricks notebook.
- Section 1 imports the required packages.
- Section 2 fetches the secret from your local system.
- Section 3 is a task that creates the required job task settings.
- Section 4 is a flow that accepts two parameters,
notebook_path
andbase_parameters
, and runs both thecreate_job_task_settings
andDatabricksSubmitMultitaskRun
tasks. - Section 5 runs the flow with desired values for the parameters.
Databricks Output
Here’s what the output looks like on Databricks!
Prefect 2 Migration
Let’s walk through each section, step by step, to see how easy it is to migrate to Prefect 2.
Section 1 — Package Imports
Before:
After:
- Replace
Flow
withflow
- Because Prefect 2 flows are simply decorators now! - Remove
Parameter
- Inputs to your flow function are automatically treated as parameters of your flow in Prefect 2. - Replace
from prefect.client.secrets import Secret
withfrom prefect_databricks.credentials import DatabricksCredentials
- Secrets are no longer just generic, unstructured secrets, they are now structured Blocks that cater to the specific service’s authentication requirements. - Replace
from prefect.tasks.databricks import DatabricksSubmitMultitaskRun
withfrom prefect_databricks.flows import jobs_runs_submit_and_wait_for_completion
- Tasks from the Prefect 1 Task Library are now broken out into their own repositories as Prefect Collections. - Replace
prefect.tasks.databricks.models
withfrom prefect_databricks.models.jobs
- The models are now subset more specifically because in the future, there will be many more models!
Section 2 — Service Secrets
Before:
databricks_conn = Secret("DATABRICKS_CONNECTION_STRING").get()
After:
databricks_credentials = DatabricksCredentials.load("my-block")
- Replace
databricks_conn = Secret("DATABRICKS_CONNECTION_STRING").get()
withdatabricks_credentials = DatabricksCredentials.load("my-block")
- As mentioned above, secrets are no longer just generic, unstructured secrets, they are now structured blocks that cater to the specific service’s authentication requirements.
Section 3 — Job Task Settings
Before & After:
- Surprise! No replacement needed — while we did make some breaking changes for the sake of improved maintainability and extensibility, we didn’t make changes without good reason!
Section 4 — Flow Tasks
Before:
After:
- Replace
with Flow("Databricks Flow") as flow:
with@flow(name="Databricks Flow")
- Flows are now simply decorators of the Python functions that you know and love! - Add
databricks_credentials = DatabricksCredentials.load("my-block")
- Although the block can be loaded outside the flow, it’s best practice to load it within! - Replace the two lines containing
Parameter
withdef jobs_runs_submit_flow(notebook_path, **base_parameters):
- As stated before, inputs to your flow function are automatically treated as parameters of your flow in Prefect 2; we really want it to feel like writing native Python! - Keep the
create_job_task_settings
line - We didn’t change it! - Replace the
DatabricksSubmitMultitaskRun
lines withjobs_runs_submit_and_wait_for_completion
with updated keyword arguments - Unlike tasks from Prefect 1 Task Library, tasks and flows from Prefect 2 collections are functions!
Section 5 — Execute Flow
Before:
After:
- Replace
flow.run
withjobs_runs_submit_flow
- We’re no longer using aFlow
class method, we’re just calling a decorated function!
That’s all! Here’s the new Prefect 2 Databricks script!
Prefect 2 Flow
Congrats on the smooth migration! If you enjoy using Prefect 2, drop a star on GitHub, and feel free to join our Slack or Discourse community!
Happy engineering!