Using mlflow to manage your Machine Learning models

5 min readMay 7, 2023

Portuguese version available here

Ever heard of mlflow? I’ll show you how it simplifies the day-to-day management of the model lifecycle and how to use it in practice,

⚠️ Spoiler Alert : you save the model with one line of code and load it with another. 🤭

Introduction

Mlflow is an open source tool for managing the lifecycle of machine learning models, supporting from the experimentation stage to deployment in production.

When developing machine learning models, it is common for models to need to be trained multiple times with different configurations and parameters to find the best solution to a problem. Without a machine learning model lifecycle management tool, this can become a cumbersome and chaotic task. It is difficult to track the different model versions, the parameters used in each training and the performance metrics associated with each model.

Also, without a tool like mlflow, it can be difficult to share models and results with teammates or other stakeholders. This can result in rework, inefficiency, and a lack of transparency in the machine learning model development process.

Mlflow helps solve these problems by providing a unified framework for managing machine learning experiments, tracking the parameters of each training, tracking performance metrics, and recording the resulting models. It also provides tools for deploying models into production and monitoring their performance over time.

It is also worth mentioning that mlflow supports several machine learning frameworks, such as scikit-learn, xgboost, pyspark, TensorFlow, PyTorch, Keras, among others, allowing model developers to work with their favorite frameworks.

Code — Hands-on

Today, I’m going to show you how to save an example template in mlflow so that it can be shared and reused in other environments.

Part 1 — Saving the model in mlflow in a simple way

To use mlflow, you need to install the package using the terminal, run the following command:

pip install mlflow

Next, we are going to use a Python code that trains a machine learning model using the iris dataset and that does a simple logistic regression, storing the parameters used in model training, the evaluated metrics and saving the trained model!

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.25, random_state=42)

# Init an mlflow experiment
mlflow.set_experiment("Experimento de exemplo")

# Define some hyperparameters
params = {"C": 0.5 , "random_state" : 42 } 

# Start the MLflow context 
with mlflow.start_run() as run: 

    # Create the model
     model = LogisticRegression(**params) 

    # Train the model
     model.fit(X_train, y_train) 

    # Evaluate the model
     score = model.score(X_test, y_test) 

    # Log the metrics and parameters
     mlflow.log_params(params) 
    mlflow.log_metrics({ "score" : score}) 

    # Save the model
    mlflow.sklearn.log_model(model, "my_sample_model" ) 
    print ( "Load the model using: " ) 
    print(f"runs:/{ run.info.run_id }/my_sample_model")

In this example, the Iris dataset is loaded using load_iris(), the data is separated into training and test sets using train_test_split().

Now, using mlflow in a simple way, an mlflow experiment is defined, started by using the mlflow.start_run().

It is interesting to add in this block mlflow.start_run()the code snippet that will run the training of the model, because in addition to allowing parameters, metrics and models to be saved, mlflow also records the processing time, you can also compare which model ran faster.

😉

The model is created and trained, with evaluation metrics being logged with mlflow.log_metrics()and parameters being logged with mlflow.log_params(). Finally, the model is saved using mlflow.sklearn.log_model().

At the end, for simplicity, we will use the run_id, extracted from run.info.run_id, to load the model in part 2.

Part 2 — Using the saved model to make the prediction

With the model already trained, we can reuse it and share it with your team, all using just one more line of code!

To load the model, we will use mlflow.sklearn.load_model("runs:/<run_id>/my_sample_model"), inserting the run_id and name of the model that we gave in the example in step 1.

The run_id is dynamically generated, so yours should be different from the code snippet below

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the same data (to use as example)
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.25, random_state=42)

# Load the trained model 
# runs:/<run_id>/my_sample_model
model = mlflow.sklearn.load_model("runs:/0b719b54777041ab90f66d04d13b5893/my_sample_model") 

# Predict using the saved model
 model.predict(X_test)

You should get a result of predictions like:

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0 , 0 , 0 , 1 , 0 , 0 , 2 , 1 , 0 ])

Ready! Now that you’ve managed to train a model, save it using mlflow and load it into another file, you can share it with your team!

Bonus — Graphical Interface

MLflow also provides a web UI for viewing and comparing designs, allowing users to view the metrics, parameter graphs, saved models, and other artifacts associated with each design.

This makes it easy to share and collaborate on machine learning projects, allowing users to share their experiments with others and see the results of different experiments quickly and easily.

To start, just open the terminal and run mlflow server, accessing the address http://127.0.0.1:5000

In the example, we can see that even changing the C parameter, we get the same score, that is, the model is not failing to learn by changing this parameter, which could be an opportunity to understand why the score is constant. 🤓

Final thoughts

mlflow is an essential tool for managing the lifecycle of machine learning models, being an evolution of the old artisanal and non-standard format for saving models, metrics and parameters.

The simplicity of use is a notorious advantage of mlflow. The code change to use mlflow is minimal, it consists of importing the package, starting the experiment and saving everything needed, the same applies to loading the saved model. This helps to make the development of machine learning models more efficient , collaborative and transparent.

I hope the sample code helped you to learn how to use mlflow in a simple way, but don’t stop there, mlflow offers more benefits, check the official documentation to continue your learning journey https://mlflow.org/docs/ latest/index.html

Let’s stay connected

Did you like the content? Let’s have a coffee, add me on LinkedIn to exchange ideas and share knowledge!

https://www.linkedin.com/in/iagobrandao

References

https://mlflow.org/docs/ latest/index.html