Schedule & orchestrate dbt Cloud jobs with Prefect
Modular Data Stack with dbt Cloud Prefect block
⚠️ Note: I no longer work at Prefect. This post may be completely outdated. Refer to Prefect docs and website to stay up-to-date.
This short post will walk you through how to set up dbt Cloud jobs and orchestrate those with Prefect. It assumes that you have already signed up for dbt Cloud and know how to use dbt.
dbt Cloud setup
First, you need to retrieve the dbt Cloud account ID and create an API key. To do that, go to Account Settings:
In the URL, you should now see the account ID. Copy it and paste it into
DBT_CLOUD_ACCOUNT_ID in your
.env file. Now from the same Account Settings page, go to the API Access section:
Create and copy the API key:
Programmatic block creation
Now, paste this key into
DBT_CLOUD_API_KEY in your
.env file. Your
.env file should now have those two environment variables:
.env file is only needed if you want to create a
DbtCloudCredentials block programmatically. The easiest way is to follow the “How to Build a Modular Data Stack — Data Platform with Prefect, dbt and Snowflake” blog post series and the prefect-dataplatform GitHub repository. You can directly leverage one of the automated deploy scripts, e.g., the local execution setup shown in the deploy_locally.py script.
Block creation from the UI
Alternatively, you can use the dbt Cloud API key and account ID to configure a block from the Prefect UI:
After configuring the credentials, we need to set up our dbt Cloud project. If you already have one, you can skip the section below.
Set up your dbt Cloud project
First, go to your dbt Cloud account and create a new project:
Follow the guided onboarding and fill in your data warehouse credentials:
At the very end, you’ll need to select your Git repository:
Then, you can start developing in the IDE:
Running a simple dbt compile can be helpful to validate that everything is working as expected:
Create a dbt Environment
Before we can create jobs, we need to configure an Environment.
You can name it corresponding to your dev/prod Snowflake data warehouse or schema. You can also point it to a given Git branch corresponding to your dev/prod environment:
Once the environment is created, dbt Cloud will encourage you to create a new job:
Create a dbt job
Follow the “Create New Job” wizard and make sure to disable the schedule so that you can orchestrate that job with Prefect:
Copy the Job ID from the URL
The URL contains the job ID that we will need to trigger that run:
Run a dbt Cloud job with Prefect
We are now ready to trigger a dbt Cloud job from Prefect. Here is a simple flow that loads the credentials block (which securely stores the dbt Cloud account ID and API key) and triggers a run for that specific job ID (adjust to match yours on line 10):
Once you run this Python script, you’ll be able to follow the logs either from the terminal:
Or from the dbt Cloud:
Or from the Prefect Cloud UI:
The logs in Prefect Cloud UI make it easy to navigate from Prefect to the logs in the dbt Cloud dashboard:
This was a short demo showing how to set up a dbt Cloud job and orchestrate it with Prefect blocks and the prefect-dbt collection. To learn more about building a dataplatform with Prefect, dbt, and Snowflake, including learning how to schedule this dbt Cloud flow, check out our full tutorial series:
How to Build a Modular Data Stack — Data Platform with Prefect, dbt and Snowflake
Build a data platform with Snowflake & dbt, and use Prefect to observe and coordinate your data stack
If anything discussed in this post is unclear, feel free to tag me when asking a question in the Prefect Community Slack or Prefect Discourse.
Thanks for reading, and happy engineering!