ML Workflow Orchestration

Deploy and orchestrate your machine learning pipeline

Haythem tellili
6 min readNov 4, 2022
Picture is by Thomas M. Evans from Unsplash

Motivation

Data and Machine Learning pipelines are extensively utilized and have become critical for business growth and scalability, offering a wide range of use cases that extend beyond recommendations, predictions, and data transformations.

Every day, businesses execute a significant number of batch processes to meet a variety of business demands.
As a result, organizing these workflows in a solid and reliable way is critical and essential to the expansion of the business.

In this Blogpost, we will guide you through Prefect, an open-source data flow automation platform that aims to decrease negative engineering and simplify workflow orchestration when compared to existing products like Airflow.

Workflow Orchestration

It is a set of tools for scheduling and monitoring work that you want to complete. Ex: Scheduling ML models training.
The purpose of workflow orchestration is to reduce mistakes and gently fail.

Prerequisites

We need to install prefect for orchestration and other python libraries, ideally to use a virtual environment.

pip install -U “prefect==2.3.2”

Overview

We will build and orchestrate a simple ML workflow using Prefect and Docker compose, as seen in the image below.

Image by Author

Our flow consist of 5 tasks:

  • Connect to Postgres database and load the data
  • Perform feature engineering and data cleaning
  • Simple Validation strategy by splitting data into Training and Validation
  • Train Gradient Boosting Regressor model
  • Evaluate The model

We want to orchestrate our flow to run every 5 minutes and to be robust by automatically retrying if there is a failure when connecting to the Postgres database by specifying the number of retries we want to try and the delay between them.

Getting Started with Prefect In practice

Docker Compose

Let’s take a look at our docker-compose file.

we have three services running:

  • pgdatabase: we will use it to store and load the data from.
  • pgadmin: web-based Graphical User Interface (GUI) management application used to communicate with Postgres.
  • orion: to run Prefect Orion server locally.

We used volumes because we don’t want information about our flow runs, deployments, work queues, and Postgres data to be lost when we restart services.

Now, to run our services we use this command:

docker compose up -d
  • -d for detached mode: indicates that containers are running in the background of the terminal.

Create tasks and flow

First, let’s understand tasks and flow in prefect:

  • A flow is the basis of all Prefect workflows. A flow is a Python function decorated with a @flow decorator.
  • A task is a Python function decorated with a @task decorator. Tasks represent distinct pieces of work executed within a flow.

With Prefect decorators we can customize retries on failure as we did in this example by setting three retries when connecting to Postgres with 5 seconds as delay between them.

We can also cache results, which is very useful if we have a task that is time and resource intensive.

We have five tasks in the code below: loading the data, preprocessing it, splitting it, training the model, and finally validating it.

Now we’re going to use these tasks to make a flow.

Now that we have our flow, we can begin deploying it.

We will create deployment.py file:

Note that we import the Deployment class and call build_from_flow with the required flow and name parameters.
We define a schedule to run the flow each 5 minutes and we specify a work_queue_name to use it when we create an agent later and finally we use the apply method to execute the deployment.

Now to register the deployment on the server we just need to run
python deployment.py .

Open the URL for the Orion UI. http://0.0.0.0:4200/

  1. Click on Deployments: you will find one with model_training_validation as name.
Image by Author

2. Click on Work Queues to see the queue that we already created with ml as name.

Image by Author

Now, let’s understand a work queue and agent.
Work queues, in a nutshell, allow us to organize flow runs into queues for execution, while agents pick up work from one or more queues and execute the runs.

This is an important feature for data scientists because it allows us to tag a flow as Prod or Dev and then route the flow to dev resources for experimenting and testing or to production resources when we create an agent.

The same thing when we build a machine learning model, we can tag a flow as GPU for training and CPU for inference, and then easily route the appropriate flow from the work queue to the appropriate agent.

3. Let’s explore Flow Runs

As shown in the image below, all scheduled flows are in yellow, completed flows are in green, and failed flows are in red.
We can also filter them by names, tags, and other metadata to find the information we need quickly.

Image by Author

Agent

Let’s now launch an agent, which will automatically select a flow from the work queue and execute it according to the schedule.
We have just to open the terminal and run this command:

prefect agent start -q ml

-q ml option tells it to pull work from the ml work queue.

Image by Author

Now, let’s pick a completed flow and look at it.

As you can see, we have some logs that tell us in real time when the task is created, executed and if we encountered some issues.

Image by Author

Other information we can access such as parameters used, sub flows if we have them.

Image by Author

We can also look at our pipeline in this Radar plot: a circle that shows tasks executed and relationship between them.

Image by Author

Conclusion

We have seen how to schedule and run our workflow using Prefect.

As next step, we can try more advanced feature of prefect such as running a flow in remote execution environments such as a Docker container, or a Kubernetes cluster.
These deployments require remote storage and infrastructure blocks that specify where our flow code is stored and how the flow run execution environment should be configured.

This project is available at my Github Repository.

I hope this blog post was helpful and If you would like to get in touch and discuss more about that you can connect with me on LinkedIn.

Reference & Additional Reading

https://docs.prefect.io/tutorials/

--

--

Haythem tellili

Machine learning engineer obsessed with automation and reproducibility.