Parallel Prefect: A bit about Dask, real fast
The problem: I heard Prefect can run Python in parallel, what is the simplest way to achieve this?
The solution: Executors
TL;DR:
Back in the early days, in 2019, Chris White wrote a great article about getting up and running with Prefect and Dask, using an external Dask cluster. In this brief post I wanted to provide a very clear instruction how to easily achieve parallelism and speed, in Prefect.
Executors specify how the tasks
within a Prefect flow are to be executed. We’re talking about the LocalDaskExecutor
in this article, we’ll get to the DaskExecutor
later. Specifically there are three executors as of publishing:
LocalExecutor
: the default, no frills executor. All tasks are executed in a single thread, parallelism is not supported.
LocalDaskExecutor
: an executor that runs ondask
primitives with a using either threads or processes.
DaskExecutor
: the most feature-rich of the executors, this executor runs ondask.distributed
and has support for distributed execution.
Under the hood prefect uses Dask primitives, which handle the appropriate controls around events, locking, coordination and waiting for task completion that is required to run parallel tasks on a single node.
The application of the executor is incredibly simple. We set the executor attribute equal to one of the three above executors, and let it rip. Below is a full example with the functional API:
A note about selecting parameters from Prefect:
Selecting a
scheduler
andnum_workers
You should use
scheduler="threads"
if:Your tasks are often IO bound (e.g. API requests, uploading/downloading data, database calls, etc…). Tasks like these can sometimes benefit from having more threads than cores, but usually not more than by a factor of 2–4 (e.g. if you have 4 cores available, set
num_workers=16
at most).Your tasks make use of separate processes (e.g. ShellTask).
Your tasks make use of libraries like
numpy
,pandas
, orscikit-learn
that release the global interpreter lock (GIL). The default value fornum_workers
is likely sufficient - tasks like these are CPU bound and won't benefit from multiple threads per core.You should use
scheduler="processes"
in most other cases. These tasks are also usually CPU bound, so the default value ofnum_workers
should be sufficient.
Stay tuned for the second installment, Prefect x Kubernetes x Dask
SLATE is a multi-disciplined engineering firm with a focus on modern approaches to data architecture and software development. SLATE delivers custom solutions that allow organizations to gain insights to drive effective decision making.