Run tasks parallelly using Prefect and Dask

Amandeep Singh
3 min readJun 21, 2023

--

Overview of essential components in Prefect:

Prefect has 2 components which are essential to run the worflow. The components are flow and task. Tasks are basically any specific piece of code, that is a function or method that you want to run multiple times and flows are used to call the defined tasks.

Before we move forward and learn how to write a code with Prefect, we need to understand Dask and how we use it in Prefect.

What is dask?

Dask is a robust Python parallel computing library that enables you to carry out data processing and analysis. Prefect has provided dask for running tasks parallely and we can use it with the help of prefect integration.

Dask can be used in the code by specifing it in the flow decorator.

Lets put this in action

Step 1: Adding all the dependencies for running a prefect worflow with dask

Make a new python file with any name of your choice. In this case we can write it like parallel_processing.py

Lets Create a virtual environment and install the dependecies.

//Create the virtual environment
python3 -m venv venv

//Activate the virtual environment
source venv/bin/activate

//install prefect
pip install prefect

//install dask
pip install prefect-dask

Step 2: Write a code with Prefect and Dask.

Lets implement an example code to understand parallel processing - A simple example of squaring all the numbers in a list for this tutorial.

from prefect import task, flow #-Line 1
from prefect_dask.task_runners import DaskTaskRunner #-Line 2

#print square of a number
@task #-Line 3
def print_squares(element):#-Line4
square = element ** 2
print(square)

#The flow that is going to call the tasks
#This function will be the entry point for our script
@flow(task_runner=DaskTaskRunner())#-Line 5
def my_flow(elements):#-Line 6
for element in elements:
print_squares.submit(element)#-Line 7

if __name__ == "__main__":
elements = [1, 2, 3, 4, 5]
my_flow(elements)

Lets break down this code to understand it:

  1. We need to import the libraries prefect and prefect_dask.task_runners(As seen on Line 1 and 2). These libraries provide the functionality needed to define and run Prefect flows, tasks and DaskTaskRunner.
  2. Using the @flow decorator on the my_flow function, we can make use of task_runner parameter DaskTaskRunner()(As seen on line 5)which is used for parallel execution.
  3. The function my_flow (On line 6) iterates over the elements and calls the function print_squares(As seen on line 4) to print the square of each element.
  4. The @task decorater(As seen on line 3) is used designate a function as a task. The flow makes 5 tasks (since we have 5 elements in the list) of print_squares function.
  5. We call the function print_squares with submit()(As seen on Line 7) method. This method has 2 uses:
  • First we can use it to submit the task to a task runner.
  • Also it is used to additionally return a PrefectFuture, a Prefect object that includes any data returned by the task function as well as a State, a Prefect object that represents the task run’s current state.

Here is the output on the terminal:

Output of the code

As you can see in this example that the flow is creating several tasks at almost the same time even though other tasks are still running. This indicates that the tasks will be executed parallely using prefect.

Step 3: Start the prefect server to see the dashboard.

Lets start the prefect server to see the flow using the prefect UI. Use the command below to use it.

prefect server start
Output of the above command.

To access the dashboard, click on the following link: http://127.0.0.1:4200. This will open the dashboard on your localhost. In this dashboard we can our flow under flow runs page.

The flow having 5 tasks running parallelly.

Happy coding!

Lets connect on LinkedIn: aman-tech.

If you enjoyed this tutorial, please click the 👏 button and share to help others find it! Feel free to leave a comment below.

--

--