Deploy Prefect Pipelines with Python: Perfect! 🐍

How to quickly make a Prefect Python deployment file

Published in

The Prefect Blog

8 min readSep 1, 2022

Prefect is an open-source dataflow coordination solution. It’s like air traffic control for your dataflows, providing you with observation and orchestration.

In this post, you’ll see how to orchestrate your Prefect decorated functions using a new feature — a deployment file written in Python. 🎉

Prefect makes it easy to add automatic retries, caching, and logging to your Python functions. Just decorate your code with flow and task decorators and you’re flying. 🛫

Here’s an example that downloads stock information from Yahoo Finance and saves it to a Parquet file. The Prefect decorators give you automatic logging, caching for 30 seconds, and three retries with the API call in case it isn’t responsive.

Code available in this GitHub repo

Not too many lines for a lot of benefits, right? However, building all this functionality from scratch would involve writing gobs of error-prone custom code. Prefect takes care of that pain so you can focus on writing code to do the things you want. 👍

Prefect makes it easy to do all kinds of things with your dataflow, including

add Slack and email notifications
use cloud providers such as AWS, GCP, Azure, or Snowflake
integrate with Docker or Kubernetes infrastructure
connect to popular data transformation and ingestion tools such as dbt and Airbyte
parallelize your code across multiple machines with Dask, Ray, or DataBricks

Once you’ve sprinkled in your flow and task decorators, it’s time to move to deploying your flows for orchestration with — wait for it — deployments! 😉

Deployments

By deploying your code to Prefect Cloud or your local server, you can easily:

schedule your flows
collaborate with other users for GUI-based orchestration
filter flows to be run by different agents
create flow runs with custom parameters from the GUI
use remote flow storage code from locations such as AWS or GitHub
turn your flow into an API

Deployments are powerful. How do you make one?

Doing it

If you haven’t already, pip install prefect into your virtual environment.

You could run a local server with prefect orion start. But when your computer is off, your flows won’t run. So for more robust orchestration, let’s use the managed Prefect Cloud server. ☁️

The free plan works great. If you want more workspaces or enterprise features, those options are available for a fee.

Sign up for https://app.prefect.cloud and click on your name in your profile to create an API Key. When you make the key, you’ll be given a code snippet to paste into your terminal. Run that, and you’ll be connected to your Prefect Cloud account.

Deployments from a Python file

Prefect 2.1 added the ability to quickly create and apply deployments by writing them in Python code. That’s the big idea we’re exploring in this post. 🎉

flow chart diagram of two ways to create deployments — Option 1 arrived with Prefect 2.1

Here’s an example Python deployment file we’ll name my_deployment.py.

Code available in this GitHub repo

Note that we import the Deployment class and call build_from_flow with the required flow and name parameters. The apply method is called when the file is executed.

What happens when you run python my_deployment.py in the terminal? Your deployment is registered on the server. 🎉

If you look at your browser displaying the Prefect UI, you can click on Deployments. You should see something like the following:

single deployment graphical user interface

Click on Python Deployment Example to see details about your new deployment.

Prefect deployment example graphical user interface

Cool! You have a deployment on the server. Now you can schedule your deployment to run a flow.

Scheduling flow runs

When a flow is scheduled to run, that flow goes into a work queue where it sits until it is picked up by an agent running on your infrastructure.

Run a flow on demand by clicking on Run on the top right. Select custom.

We didn’t add a default value for the ticker argument to our flow, and we didn’t set it when we defined our deployment, so we need to add it now. Scroll to Parameters and enter a stock ticker symbol string. Hit Save and Run.

Form with parameter APPLE from graphical user interface

Alright, now that your flow has been scheduled, you need to fire up an agent to run it.

Agent

Start an agent on your infrastructure to run your flow code by entering prefect agent start -q default into a terminal window.

Your agent will pick up any scheduled flow runs that are in your default work queue. You should see logs from your agent running your flow. 😎

More deployment options

Let’s create a second deployment and give it some more bells and whistles. 🔔

S3 Storage

Let’s assume we’ve set up an AWS S3 bucket to save our flow code to when we create a deployment. The agent will then fetch the flow code from the AWS bucket when it’s time to run the deployment.

You need the s3fs package installed in your environment to interact with S3. If you don’t already have it, run pip install s3fs.

Prefect ships with an S3 integration called a block.

Blocks

Blocks are an awesome Prefect invention that allows you to save the configuration and share it throughout your workspace. I have another post on blocks in the works that I’ll share soon, so follow me to make sure you don’t miss it! 👍

Once we’ve made our S3 block, we can import and use it. Prefect even provides a code snippet for you when you create your block.

Schedules

Let’s add a schedule so your flow runs every minute. ⏱

Prefect ships with a number of scheduler options. Here’s how you import the schedule classes: prefect.orion.schemas.schedules import IntervalSchedule, RRuleSchedule, CronSchedule .

Then you can pass the schedule parameter an instance of your class. For example, you can set your deployment to be scheduled every minute with IntervalSchedule(interval=60) .

Putting it all together

Here’s how this more advanced Python deployment file looks:

Code available in this GitHub repo

Let’s save the file as my_second_deployment.py and run the code with python my_second_deployment.py.

Then, you can see your new deployment in the GUI.

deployments listed in graphical user interface

Parameters

Here are the Deployment class’s build_from_flow method parameters with brief commentary.

Required

flow: The name of the flow this deployment encapsulates.
name: A name for the deployment.

Optional arguments:

version: An optional version for the deployment. Defaults to the flow’s version.
output: if provided, the full deployment specification will be written as a YAML file in the location specified by output. You don’t need to output a YAML file, but you can.
skip_upload: if True, deployment files are not automatically uploaded to remote storage. If you don’t want to re-upload files, this is a handy setting.
apply: if True, the deployment is automatically registered with the API. Personally, I’d rather apply in the if __name__ == "__main__" block.

Optional keyword-only arguments:

description: An optional description of the deployment. Defaults to the flow’s description.
tags: An optional list of tags to associate with this deployment; note that tags are used only for organizational purposes. For delegating work to agents, see work_queue_name.
schedule: A schedule to run this deployment on. Prefect offers several scheduling formats.
work_queue_name: The work queue that will handle this deployment’s runs.
parameters: A dictionary of parameter values to pass to runs created from this deployment. If you didn’t specify default arguments for your flow, this is a good place to do so.
infrastructure: DockerContainer, KubernetesJob, or Process. An optional infrastructure block used to configure infrastructure for runs. If not provided, will default to running this deployment in Agent subprocesses.
infra_overrides: A dictionary of dot delimited infrastructure overrides that will be applied at runtime; for example env.CONFIG_KEY=config_value or namespace='prefect'. Often useful when working with K8s.
storage: An optional remote storage block used to store and retrieve this workflow. If not provided, will default to referencing this flow by its local path.
path: The path to the working directory for the workflow, relative to remote storage or, if stored on a local filesystem, an absolute path. This allows you to use the same infrastructure block across multiple deployments.
entrypoint: The path to the entrypoint for the workflow, always relative to the path. You might find this option helpful if your flow code is in a subfolder in your remote storage.

How does using a Python file for Deployments compare to using the CLI?

Deployments from the command line

Prefect 2 introduced the ability to create a deployment from the command line. Anna Geller wrote about the benefits of this approach in this post.

When you use the command line to build a deployment, a YAML file is created. You can then push the deployment to the Prefect server by applying the file.

In the example below, your flow named pipeline is defined in demo.py. Here’s how you can build and apply your pipeline in two lines of code:

This command will create a deployment on your Prefect server named cool_deployment.

Now with Prefect 2.3.0, you can combine those two steps into one step by using the --apply flag with the build command. 🔥

Which approach should I use?

With Prefect 2.3, there are more similarities than differences between the two approaches. Now both options can create deployment definition YAML files that you can version control.

Help and niceties

If you are using the Python deployment approach, VSCode and other IDEs let you take advantage of type hints and autocompletion. These features can be helpful when editing a Python file.

In contrast, the CLI approach has a —-help option for every level of Prefect command. For example, prefect deployment build --help will hook you up with all the flags you can use when building a deployment from the CLI.

Of course, the docs are there to help you, too.

However, it’s nice to get benefits as you type. ✨

CI/CD

If you want to grab a git hash for your CI/CD workflow, the CLI approach can let you pass that hash to your deployment version.

This approach is much easier than trying to get the hash into your Python deployment file.

In the end, both the CLI and Python file approaches work great. Use whichever works best for your use case. 🙂

Wrap 🌯

The Python Deployment option was added in response to user feedback. Thank you to everyone in the Prefect community of 20,000 Slack users and 10,000 GitHub stargazers who has helped make Prefect such a joy to use. Please keep the feedback coming! 🎉

In this guide, you’ve seen how easy it is to use a Python file to create a deployment. Feel free to grab the Python code from this GitHub repo.

If you haven’t tried the Python deployment file approach yet, take it for a spin and let us know how it goes!🚀

Happy building!

Deploy Prefect Pipelines with Python: Perfect! 🐍

How to quickly make a Prefect Python deployment file

Deployments

Doing it

Deployments from a Python file

Scheduling flow runs

Agent

More deployment options

S3 Storage

Blocks

Schedules

Tags

Putting it all together

Parameters

Required

Optional arguments:

Optional keyword-only arguments:

Deployments from the command line

Which approach should I use?

Help and niceties

CI/CD

Wrap 🌯

Written by Jeff Hale