Deploy Prefect Pipelines with Python: Perfect! 🐍
How to quickly make a Prefect Python deployment file
Prefect is an open-source dataflow coordination solution. It’s like air traffic control for your dataflows, providing you with observation and orchestration.
In this post, you’ll see how to orchestrate your Prefect decorated functions using a new feature — a deployment file written in Python. 🎉
Prefect makes it easy to add automatic retries, caching, and logging to your Python functions. Just decorate your code with flow and task decorators and you’re flying. 🛫
Here’s an example that downloads stock information from Yahoo Finance and saves it to a Parquet file. The Prefect decorators give you automatic logging, caching for 30 seconds, and three retries with the API call in case it isn’t responsive.
Not too many lines for a lot of benefits, right? However, building all this functionality from scratch would involve writing gobs of error-prone custom code. Prefect takes care of that pain so you can focus on writing code to do the things you want. 👍
Prefect makes it easy to do all kinds of things with your dataflow, including
- add Slack and email notifications
- use cloud providers such as AWS, GCP, Azure, or Snowflake
- integrate with Docker or Kubernetes infrastructure
- connect to popular data transformation and ingestion tools such as dbt and Airbyte
- parallelize your code across multiple machines with Dask, Ray, or DataBricks
Once you’ve sprinkled in your flow and task decorators, it’s time to move to deploying your flows for orchestration with — wait for it — deployments! 😉
Deployments
By deploying your code to Prefect Cloud or your local server, you can easily:
- schedule your flows
- collaborate with other users for GUI-based orchestration
- filter flows to be run by different agents
- create flow runs with custom parameters from the GUI
- use remote flow storage code from locations such as AWS or GitHub
- turn your flow into an API
Deployments are powerful. How do you make one?
Doing it
If you haven’t already, pip install prefect
into your virtual environment.
You could run a local server with prefect orion start
. But when your computer is off, your flows won’t run. So for more robust orchestration, let’s use the managed Prefect Cloud server. ☁️
The free plan works great. If you want more workspaces or enterprise features, those options are available for a fee.
Sign up for https://app.prefect.cloud and click on your name in your profile to create an API Key. When you make the key, you’ll be given a code snippet to paste into your terminal. Run that, and you’ll be connected to your Prefect Cloud account.
Deployments from a Python file
Prefect 2.1 added the ability to quickly create and apply deployments by writing them in Python code. That’s the big idea we’re exploring in this post. 🎉
Here’s an example Python deployment file we’ll name my_deployment.py.
Note that we import the Deployment
class and call build_from_flow
with the required flow
and name
parameters. The apply
method is called when the file is executed.
What happens when you run python my_deployment.py
in the terminal? Your deployment is registered on the server. 🎉
If you look at your browser displaying the Prefect UI, you can click on Deployments. You should see something like the following:
Click on Python Deployment Example to see details about your new deployment.
Cool! You have a deployment on the server. Now you can schedule your deployment to run a flow.
Scheduling flow runs
When a flow is scheduled to run, that flow goes into a work queue where it sits until it is picked up by an agent running on your infrastructure.
Run a flow on demand by clicking on Run on the top right. Select custom.
We didn’t add a default value for the ticker argument to our flow, and we didn’t set it when we defined our deployment, so we need to add it now. Scroll to Parameters and enter a stock ticker symbol string. Hit Save and Run.
Alright, now that your flow has been scheduled, you need to fire up an agent to run it.
Agent
Start an agent on your infrastructure to run your flow code by entering prefect agent start -q default
into a terminal window.
Your agent will pick up any scheduled flow runs that are in your default work queue. You should see logs from your agent running your flow. 😎
More deployment options
Let’s create a second deployment and give it some more bells and whistles. 🔔
S3 Storage
Let’s assume we’ve set up an AWS S3 bucket to save our flow code to when we create a deployment. The agent will then fetch the flow code from the AWS bucket when it’s time to run the deployment.
You need the s3fs package installed in your environment to interact with S3. If you don’t already have it, run pip install s3fs
.
Prefect ships with an S3 integration called a block.
Blocks
Blocks are an awesome Prefect invention that allows you to save the configuration and share it throughout your workspace. I have another post on blocks in the works that I’ll share soon, so follow me to make sure you don’t miss it! 👍
Once we’ve made our S3 block, we can import and use it. Prefect even provides a code snippet for you when you create your block.
Schedules
Let’s add a schedule so your flow runs every minute. ⏱
Prefect ships with a number of scheduler options. Here’s how you import the schedule classes: prefect.orion.schemas.schedules import IntervalSchedule, RRuleSchedule, CronSchedule
.
Then you can pass the schedule
parameter an instance of your class. For example, you can set your deployment to be scheduled every minute with IntervalSchedule(interval=60)
.
Tags
Let’s add the tag extract to keep things organized, in case you decide to get wild and start adding lots of deployments. 🤘
Note that tags aren’t for filtering anymore. You can use work queues for that.
Putting it all together
Here’s how this more advanced Python deployment file looks:
Let’s save the file as my_second_deployment.py and run the code with python my_second_deployment.py
.
Then, you can see your new deployment in the GUI.
Parameters
Here are the Deployment class’s build_from_flow method parameters with brief commentary.
Required
- flow: The name of the flow this deployment encapsulates.
- name: A name for the deployment.
Optional arguments:
- version: An optional version for the deployment. Defaults to the flow’s version.
- output: if provided, the full deployment specification will be written as a YAML file in the location specified by
output
. You don’t need to output a YAML file, but you can. - skip_upload: if True, deployment files are not automatically uploaded to remote storage. If you don’t want to re-upload files, this is a handy setting.
- apply: if True, the deployment is automatically registered with the API. Personally, I’d rather
apply
in theif __name__ == "__main__"
block.
Optional keyword-only arguments:
- description: An optional description of the deployment. Defaults to the flow’s description.
- tags: An optional list of tags to associate with this deployment; note that tags are used only for organizational purposes. For delegating work to agents, see work_queue_name.
- schedule: A schedule to run this deployment on. Prefect offers several scheduling formats.
- work_queue_name: The work queue that will handle this deployment’s runs.
- parameters: A dictionary of parameter values to pass to runs created from this deployment. If you didn’t specify default arguments for your flow, this is a good place to do so.
- infrastructure: DockerContainer, KubernetesJob, or Process. An optional infrastructure block used to configure infrastructure for runs. If not provided, will default to running this deployment in Agent subprocesses.
- infra_overrides: A dictionary of dot delimited infrastructure overrides that will be applied at runtime; for example
env.CONFIG_KEY=config_value
ornamespace='prefect'
. Often useful when working with K8s. - storage: An optional remote storage block used to store and retrieve this workflow. If not provided, will default to referencing this flow by its local path.
- path: The path to the working directory for the workflow, relative to remote storage or, if stored on a local filesystem, an absolute path. This allows you to use the same infrastructure block across multiple deployments.
- entrypoint: The path to the entrypoint for the workflow, always relative to the path. You might find this option helpful if your flow code is in a subfolder in your remote storage.
How does using a Python file for Deployments compare to using the CLI?
Deployments from the command line
Prefect 2 introduced the ability to create a deployment from the command line. Anna Geller wrote about the benefits of this approach in this post.
When you use the command line to build a deployment, a YAML file is created. You can then push the deployment to the Prefect server by applying the file.
In the example below, your flow named pipeline is defined in demo.py. Here’s how you can build and apply your pipeline in two lines of code:
This command will create a deployment on your Prefect server named cool_deployment.
Now with Prefect 2.3.0, you can combine those two steps into one step by using the --apply
flag with the build command. 🔥
Which approach should I use?
With Prefect 2.3, there are more similarities than differences between the two approaches. Now both options can create deployment definition YAML files that you can version control.
Help and niceties
If you are using the Python deployment approach, VSCode and other IDEs let you take advantage of type hints and autocompletion. These features can be helpful when editing a Python file.
In contrast, the CLI approach has a —-help
option for every level of Prefect command. For example, prefect deployment build --help
will hook you up with all the flags you can use when building a deployment from the CLI.
Of course, the docs are there to help you, too.
However, it’s nice to get benefits as you type. ✨
CI/CD
If you want to grab a git hash for your CI/CD workflow, the CLI approach can let you pass that hash to your deployment version.
This approach is much easier than trying to get the hash into your Python deployment file.
In the end, both the CLI and Python file approaches work great. Use whichever works best for your use case. 🙂
Wrap 🌯
The Python Deployment option was added in response to user feedback. Thank you to everyone in the Prefect community of 20,000 Slack users and 10,000 GitHub stargazers who has helped make Prefect such a joy to use. Please keep the feedback coming! 🎉
In this guide, you’ve seen how easy it is to use a Python file to create a deployment. Feel free to grab the Python code from this GitHub repo.
If you haven’t tried the Python deployment file approach yet, take it for a spin and let us know how it goes!🚀
Happy building!