ML Lifecycle with MLFlow — MLDevOps

Ritik Jain
Crux Intelligence
Published in
5 min readMar 25, 2022
Photo by Martin Adams on Unsplash

ML modelling requires a lot of deep knowledge of ML and its components before it can be built. But executing entire lifecycle with keeping track of all experiments from experimenting to deploying models to production environment is harder.

The ML engineer develops the model, evaluates it, retrains it, and releases it for testing. After all experimentation, comparing and evaluating different models and choosing the best model is unachievable without an explicit automated manager to keeps the track.

Traditional ML Lifecycle

MLFLow

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. MLFlow provides simple API to perform tasks like reproducing the model, logging and tracking hyper-parameters, monitoring model performance and deployment in production.

Core of MLFlow have 4 major components:

The components of MLflow — taming end-to-end ML lifecycle management.
  • MLFlow Tracking
    This component provides API and UI interface. API interface enables MLFlow to log parameters, metrics, code versions and artifacts while running your machine learning code. UI interface provides the various visualisation of result, logged parameters, other metrics, and model comparisons. ML Flow tracking can run in any environment like a standalone script or a notebook and log results in local directory or centralised databases. It also provides a centralized access point for teams to compare results for different users.
  • MLFlow Projects
    It provides a standard packaging mechanism for storing ML environment setup and code base. In MLFlow, each project is a simple directory with code or git repository and a descriptor file to specify its dependencies and how to run the code. When you use the MLFlow tracking API in a project, MLFlow automatically remembers the project version. In MLFlow, projects can be chained into multi-step workflows easily.
  • MLFlow Models
    It offers a convention for packaging machine learning models in various flavors (different formats). Each model saves a directory containing the model binaries and a descriptor file listing down different flavors in which the model can be used. For example, a PyTorch model can be loaded as a PyTorch binary file, or as a python, function to apply input data. MLFLow provide various tools to deploy many common model types to diverse platform: Any model supporting “python function” can be deployed to a Docker-based REST server, to cloud platforms like AWS Sagemaker and Azure ML.
  • MLFlow Registry
    It offers a centralized model store, set of APIs, and UI to collaboratively manage the entire model lifecycle of an MLFlow model. MLFlow Registry provides different features like model stage transition, model versioning, model lineage, and model annotations.

Installing MLFLow

  1. Setup the anaconda environment (Follow the documentation)
  2. Use conda install -c conda-forge mlflow for installing mlflow

If installation works fine,

mlflow --version

the output will be,

mlflow, version 1.24.0

Depending on the time and version you use, the output may differ.

Quickstart with MLFlow

Let's perform an experiment and train a classification model on the IRIS dataset. I am going to use the scikit-learn library for dataset loading, transforming, AllaNumPynd model training.

Let’s get started.

Code without MLFlow APIs

Simple MLFLow Workflow for Scikit-Learn

  1. Start an experiment using mlflow.start_run() which switches the context of your existing model code to enable mlflow tracking.
  2. Detect all the important parameters for logging into MLFlow Tracking
  3. After model training and evaluation, I have logged the model using mlflow.sklearn.log_model(). As I’m using scikit-learn for training the model, So using the sklearn package of mlflow to log the model. Similarly to other frameworks, MLFlow has a wide range of packages. (find more here)

Code with MLFlow API integrated

I have added a few self-explanatory lines for starting the experiment, logging parameters, and model details.

MLFlow UI

After completing the model training and logging, we can track the model progress using MLFlow UI.

To enable the tracking, Navigate to the curselfrent project in Terminal and use the command below

mlflow ui

Output,

[2022-03-14 13:04:07 +0530] [65940] [INFO] Starting gunicorn 20.1.0
[2022-03-14 13:04:07 +0530] [65940] [INFO] Listening at: http://127.0.0.1:5000 (65940)
[2022-03-14 13:04:07 +0530] [65940] [INFO] Using worker: sync
[2022-03-14 13:04:07 +0530] [65942] [INFO] Booting worker with pid: 65942

It will host the UI on localhost network and port 5000.

MLFlow Tracking Board

As shown in UI, it is sectioned based on Experiments. In each experiment you’ll have all passed or failed iteration of model training with the parameters used and their metrics. Clicking on a row in Table, a redirect will be made to the selected run, showing all the details shown below.

It will redirect to selected run and show all the details shown below. For reusability, the model will have all parameters with auto-generated code that enables predictions to be made.

MLFlow Run Details

Conclusion

  • MLFlow is an open-source platform for managing the machine learning lifecycle
  • It has 4 major components that work synchronously to provide pipeline experience.
  • MLFlow provides a simple user-interface to navigate and understanding and tracking model progress
  • Support of over 10 popular machine learning frameworks and different deployment plugins
  • Provide modern API interfaces like REST API, gRPC
  • Ease of model registry and production deployment.

References

Interested to see more like these. Buy me a coffee

--

--

Ritik Jain
Crux Intelligence

Fallen for data and understand the problems which can be resolve. Passionate for ML and MLOps.