MLflow, Hyperopt, Prefect, Evidently, and Grafana: The Ultimate Guide to Building, Tracking, Orchestrating, and Monitoring Machine Learning Pipelines

13 min readSep 3, 2023

The steps are the following:

Introduction
Set up the Environment
Configure MLflow
Load and split the data
Train and tune the model
Choose the best model
Promote the best model for production
Deploy Grafana Dashboard and Postgres database using Docker Compose
Setting up the Postgres Database
Monitor model performance and serve model
Orchestrate the pipeline using Prefect
Simulating a Production Environment

Introduction

In this comprehensive guide, we explore the seamless integration of MLflow, Hyperopt, Prefect, Evidently, and Grafana. You’ll discover how these tools empower you to:

Build Robust Models: MLflow simplifies model development, allowing you to experiment and iterate efficiently, while Hyperopt optimizes model hyperparameters for peak performance.
Track and Version Models: Keep meticulous records of your models with MLflow’s tracking capabilities, ensuring reproducibility and collaboration across teams.
Orchestrate Workflows: Prefect enables you to design, schedule, and automate complex ML workflows, ensuring your models are trained and deployed systematically.
Monitor Model Performance: Evidently helps you gain deep insights into model behavior and detect issues early, ensuring models remain reliable in production.
Visualize and Alert: Grafana provides real-time visualization and alerting, giving you the tools to continuously monitor your ML pipelines and respond to anomalies swiftly.

In this age of data-driven decision-making, these tools are your allies, streamlining the ML lifecycle from development and experimentation to deployment and monitoring. Our guide equips you with the knowledge to harness their potential, elevating your machine-learning projects to new heights of efficiency and reliability.

Set up the Environment

1- Instal libraries, in cmd run

pip install mlflow
pip install hyperopt
pip install xgboost
pip install prefect
pip install evidently

2- Lunch mlflow server, in cmd run

mlflow server --backend-store-uri sqlite:///backend.db --default-artifact-root ./mlruns

This command starts an instance of the MLflow server with the following configurations:

--backend-store-uri sqlite:///backend.db: specifies the backend store URI where the MLflow server should persist metadata related to experiments, runs, parameters, metrics, and artifacts. In this case, the backend store uses an SQLite database file named backend.db.
--default-artifact-root ./mlruns: specifies the default artifact store location where the MLflow server should store artifacts generated by runs. In this case, the default artifact store location is the ./mlruns directory relative to the current working directory.

3- Lunch prefect server, in cmd run

prefect server start

The prefect server start command is used to start the Prefect Server. The Prefect Server is a central daemon that provides a variety of features for managing and executing Prefect flows, including:

Flow execution: The Prefect Server can be used to execute flows, both locally and in a distributed fashion.
Flow monitoring: The Prefect Server can be used to monitor the execution of flows, providing information such as the status of each task, the logs for each task, and the metrics for each task.
Flow scheduling: The Prefect Server can be used to schedule the execution of flows, either on a recurring basis or on demand.
Flow versioning: The Prefect Server can be used to version flows, providing a way to track changes to flows over time.

To view the Prefect UI, open a web browser and navigate to http://127.0.0.1:4200/

3- Install Docker

Download the Docker Desktop for Windows installer from the Docker website: Docker Desktop for Windows.
Run the installer and follow the installation wizard.
Ensure that you have enabled virtualization in your BIOS settings if required.

Configure MLflow

This Python code defines a Prefect task, marked with @task, that sets up the environment for using MLflow, a tool for managing machine learning experiments. It does the following:

Sets the MLflow tracking URI to “http://127.0.0.1:5000," assuming a local MLflow tracking server is running on that address.
Specifies the active experiment by name. If the experiment doesn’t exist, it creates one. All subsequent MLflow operations within this task are associated with this experiment.
Retrieves and returns the ID of the experiment, which can be useful for further interactions.

Load and split the data

This code defines two Prefect tasks within a workflow for handling a machine learning dataset:

load_data Task:

This task loads a dataset using Scikit-learn’s datasets.load_digits() function. The dataset contains handwritten digit images.
It extracts the features (pixel values) from the dataset and the corresponding target labels.
The data is organized into a Pandas DataFrame for further processing.
The task returns this DataFrame.

split_data Task:

This task takes the DataFrame returned by the load_data task as input.
It splits the dataset into training and testing subsets using Scikit-learn’s train_test_split function. This is a common step in preparing data for machine learning.
The split is performed with 80% of the data used for training and 20% for testing, and a random seed is set for reproducibility.
The task returns four variables: x_train, x_test, y_train, and y_test, representing the training features, testing features, training labels, and testing labels, respectively.

Train and tune the model

Hyperparameter Search Space Definition:

This task defines a search space for hyperparameter optimization. It specifies different hyperparameters and their possible values as options for optimization. These hyperparameters include learning rate, max depth, gamma, colsample bytree, reg_alpha, reg_lambda, and seed.

Objective Function:

An objective function (objective) is defined within the train_hyperparameter_tuning task. This function takes a set of hyperparameters as input.
Inside the objective function:
A new MLflow run is started to log parameters and metrics.
An XGBoost classifier is created with the given hyperparameters.
The classifier is trained on the training data (x_train, y_train) and evaluated on the testing data (x_test, y_test).
Metrics like accuracy and F1 score are calculated and logged to MLflow.
The trained model is also logged to MLflow as an artifact.
The objective function returns a dictionary with the negative accuracy value, which Hyperopt tries to minimize.

Hyperparameter Optimization:

The fmin function from Hyperopt is called to perform Bayesian hyperparameter optimization (tpe.suggest) using the defined search space.
The optimization aims to find the hyperparameters that minimize the negative accuracy (maximize accuracy).

Return Best Result:

The best set of hyperparameters found by Hyperopt is returned as best_result.

Choose the best model

MLflow Client Setup:

The task sets up an MLflow client to interact with an MLflow tracking server located at “http://127.0.0.1:5000".

Retrieve the Best Run:

searches for runs within a specific MLflow experiment (experiment_id).
The runs are sorted by accuracy in descending order (order_by=["metrics.accuracy DESC"]), and the top run with the highest accuracy is selected as the best run.

Get the Run ID and Model URI:

The run ID of the best run is extracted.
A model URI is constructed based on the run ID.

Search for Model Versions:

The task searches for model versions associated with the specific run.
It constructs a filter string to search for versions linked to the identified run.
The results are returned as a list.

Return the Best Model Version and Model URI:

The version number of the best model (model_version) is obtained from the search results.
The constructed model URI (model_uri) and the model version are returned as a tuple.

Promote the best model for production

Define the New Stage:

The task specifies the target stage to which the model version will be promoted, which is set to “Production” as indicated by new_stage.

MLflow Client Setup:

It initializes an MLflow client to interact with an MLflow tracking server located at “http://127.0.0.1:5000".

Promote the Model Version:

The task calls the transition_model_version_stage method of the MLflow client to change the stage of a specific model version (model_version) associated with a given model name (model_name) to the new stage specified.
The archive_existing_versions parameter is set to False, which means that existing versions of the model will not be archived when promoting this version.

Deploy Grafana Dashboard and Postgres database using Docker Compose

1- Check container status

checks the status of specified Docker containers by their names.

It uses the docker.from_env() method to create a Docker client to interact with the Docker daemon running on the local system.
The function takes a list of container names (container_names) as input.
It initializes a counter running_containers to zero, which will keep track of how many of the specified containers are currently running.
For each container name in the input list:
It uses the Docker client to list containers that match the provided name using the client.containers.list() method.
If there is at least one container with that name, it checks if the first container in the list (assuming no duplicate names) is running by inspecting its state with client.api.inspect_container().
If the inspected container is running, it increments the running_containers counter.
After each container check, it waits for 1 second using time.sleep(1) before checking the next container.
Finally, it closes the Docker client connection with client.close() and returns the count of running containers (running_containers).

2- Build Docker

It takes a list of container names (container_names) as input.
It calls the container_status function to check the current status of the specified containers. The result is stored in running_containers.
It checks if the count of running containers (running_containers) is not equal to the expected count. This condition is used to determine whether the desired containers are already running.
If the count of running containers is not equal to the expected count (indicating that the desired containers are not running), it uses the docker-compose command to start the Docker containers defined in the docker-compose.yml file, potentially rebuilding the associated Docker images if necessary.

version: '3.7'


volumes: 
  grafana_data: {}

networks:
  front-tier:
  back-tier:

services:
  db:
    container_name: postgres
    image: postgres
    restart: always
    environment:
      POSTGRES_PASSWORD: example
    ports:
      - "5432:5432"
    networks:
      - back-tier

  adminer:
    container_name: adminer
    image: adminer
    restart: always
    ports:
      - "8080:8080"
    networks:
      - back-tier
      - front-tier  

  grafana:
    container_name: grafana
    image: grafana/grafana
    user: "472"
    ports:
      - "3000:3000"
    volumes:
      - ./config/grafana_datasources.yaml:/etc/grafana/provisioning/datasources/datasource.yaml:ro
    networks:
      - back-tier
      - front-tier
    restart: always

This is a Docker Compose file (docker-compose.yml) that defines a multi-container application. Here's a brief explanation of what this file does:

Version: Specifies the version of the Docker Compose file format being used, which is ‘3.7’ in this case.
Volumes: Defines a Docker volume named grafana_data. Volumes are used to persist data generated by containers.
Networks: Defines two Docker networks named front-tier and back-tier. These networks can be used to isolate and connect containers.
Services: Specifies the different services (containers) that make up the application:

db (PostgreSQL): This service uses the official PostgreSQL image, sets a password, maps port 5432 to the host, and connects it to the back-tier network. It's named "postgres."
adminer: This service uses the Adminer image, maps port 8080 to the host, and connects it to both back-tier and front-tier networks. It's named "adminer."
grafana: This service uses the Grafana image, sets a user, maps port 3000 to the host, mounts a configuration file, and connects it to both back-tier and front-tier networks. It's named "grafana."

3- Wait for the containers to start running.

It uses the docker.from_env() method to create a Docker client to interact with the Docker daemon running on the local system.
The task takes a list of container names (container_names) as input.
For each container name in the input list, it enters a while loop.
Inside the loop, it repeatedly checks the status of the specified container: It lists containers that match the provided name using the client.containers.list() method.If there is exactly one container with that name and it’s in a running state, the loop exits with break.
If the container is not found or not in a running state, the task sleeps for 120 seconds (time.sleep(120)) before checking the container status again.
After the loop exits, the Docker client connection is closed with client.close().

Setting up the Postgres Database

It defines a SQL statement create_table_statement that drops a table if it exists and creates a new table named "predictions_metrics" with specified columns.
It establishes a connection to a PostgreSQL database running locally with connection details such as the host, user, password, database name, and port.
It sets the isolation level to ISOLATION_LEVEL_AUTOCOMMIT, ensuring that database operations like creating a new database can be executed.
It executes a SQL query to check if a database named ‘test’ exists.
If the ‘test’ database does not exist, it creates it using a SQL query.
It establishes a connection to the newly created ‘test’ database and creates the ‘predictions_metrics’ table within it.

It loads a machine learning model specified by the model_uri using MLflow's mlflow.pyfunc.load_model method.
It uses the loaded model to make predictions on the x_train dataset and adds these predictions as a new column called 'prediction' in x_train.
It establishes a connection to a PostgreSQL database running locally with connection details such as the username (‘postgres’), password (‘example’), host (‘localhost’), and port (‘5432’).
It stores the x_train dataset with the added 'prediction' column into a table named 'reference' in the 'test' database. If the 'reference' table already exists, it is replaced.
It also stores the x_test dataset in a table named 'production' in the same 'test' database, again replacing it if it already exists.

Monitor model performance and serve model

It uses the os.system function to execute two separate commands:

The first command (start killport 8000) is intended to release or "kill" the port 8000 if it's already in use. This ensures that the specified port is available for the subsequent FastAPI service.
The second command (start uvicorn main:app) starts the Uvicorn web server, serving the FastAPI app defined in the main module. This effectively launches the model-serving API.

This code defines a FastAPI endpoint for making predictions using a machine learning model, recording prediction drift metrics, and storing data in a PostgreSQL database. Here’s a concise explanation:

The code sets the MLflow tracking URI and initializes a FastAPI application.
Utility functions are defined:
load_model: Loads an MLflow model from a specified URI.
get_data_from_db: Retrieves data from a PostgreSQL database.
calculate_metrics_postgresql: Calculates drift metrics between reference and current data and stores them in the database.
An input data model InputData is defined for the POST request. It includes fields for input data.
The /predict/ endpoint receives POST requests with input data, processes the data, and makes predictions using the loaded MLflow model.
Prediction drift metrics are calculated by comparing input data’s predictions to reference data stored in the database.
The prediction and metrics are returned as a response from the endpoint.

Orchestrate the pipeline using Prefect

This code defines a Prefect flow and deploys it with a schedule for a machine learning pipeline. Here’s a concise explanation:

The main function is the core of the flow. It orchestrates the following steps:

Initializes an MLflow experiment and retrieves its ID.
Loads data and splits it into training and testing sets.
Performs hyperparameter tuning and selects the best model.
Promotes the best model for production use.
Starts Docker containers specified in container_names.
Waits for the containers to be up and running.
Prepares a PostgreSQL database.
Prepares reference data and serves the machine learning model using FastAPI.

The if __name__ == "__main__": block builds a Prefect deployment:

It constructs a Prefect Deployment object, specifying the main function as the flow, with parameters for the experiment name, model name, and container names.
A schedule is set using a Cron expression (e.g., every Thursday at 12:00 AM) and a specific timezone.
The deployment is named “model_training_and_tuning_weekly” and given a version.
It specifies the work queue name as “ml.”
Finally, the deployment is applied, meaning the flow will execute as scheduled.

Run the following command in cmd

prefect agent start --pool default-agent-pool --work-queue ml

This command starts a Prefect agent to manage the execution of Prefect flows on a specific pool and work queue.

The prefect agent start command starts a new Prefect agent process.

The --pool flag specifies the name of the pool that the agent should use for executing flows. In this case, the pool is named default-agent-pool.

The --work-queue flag specifies the name of the work queue that the agent should use for receiving work. In this case, the work queue is named ml.

Run the following command in cmd

python app.py

app.py script contains a Prefect flow defined with the @flow decorator, the flow will be registered with the Prefect backend and scheduled to run according to its defined schedule.

open a web browser and navigate to http://127.0.0.1:4200/

Weekly deployment of model training and tuning in the deployments tab

Review main flow steps in the Prefect UI

MLflow experiment for digits classification

Open http://localhost:3000/ with user: admin and password : admin to open Grafana dashboard

Open http://localhost:8080/ to see three tables created in the database

Simulating a Production Environment

get_data_from_db(table_name):

This function connects to a PostgreSQL database on the local host with specified credentials.
It reads data from a table named table_name and returns it as a Pandas DataFrame.

simulate_poduction():

This function simulates a production environment by sending data to a specified API URL for predictions.
It first retrieves production data from the ‘production’ table in the database using get_data_from_db.
Then, it iterates over the rows of the production data, converts each row to a JSON-like dictionary, and sends it to the API using a POST request.
The predictions received from the API response are printed, and there’s a sleep of 120 seconds (2 minutes) between each request.

The if __name__ == "__main__": block executes the simulate_poduction() function when the script is run.

Run this script in cmd python test.py

Refresh http://localhost:8080/ to find prediction_metrics data populated

Open Grafana (http://localhost:3000/) to build a dashboard

Choose Postgres database connection then choose the columns you need to build the graph

Choose Postgress then choose prediction_drift and timestamp to build a time series chart

conclusion

In conclusion, this blog has introduced a powerful combination of tools for building, tracking, orchestrating, and monitoring machine learning pipelines. MLflow, Hyperopt, Prefect, Evidently, and Grafana offer a comprehensive solution for improving the efficiency, reproducibility, and performance of your machine learning projects. By implementing these tools in your workflow, you can enhance collaboration, automate pipeline management, and ensure the ongoing reliability of your machine-learning models in production environments. This ultimate guide equips you with the essential tools and knowledge to elevate your machine-learning pipelines to a new level of sophistication and effectiveness.

you can find Github Repo

Resources

mlops-zoomcamp/05-monitoring at main · DataTalksClub/mlops-zoomcamp

Free MLOps course from DataTalks.Club. Contribute to DataTalksClub/mlops-zoomcamp development by creating an account on…

github.com