MLflow — a modern MLOps tool for data project collaboration

Published in

SFU Professional Computer Science

16 min readFeb 11, 2023

Authors: Crystal Dias, Tyler D’Silva, Hoi Fai Lam, Hong Dung Nguyen, Bhavya Singh

This blog is written and maintained by students in the Professional Master’s Program in the School of Computing Science at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/mpcs}.

Introduction

The machine learning lifecycle is a cyclic iterative process with instructions and best practices for developing machine learning algorithms to solve business problems. The lifecycle defines the different phases to provide structure for making a successful machine learning project. The phases include:

With the increased adoption of this lifecycle, enterprises want to increase the velocity to create a project from experimentation to production. This involves creating tools so data scientists can collaborate and share results to find the optimal solution to achieve the business goal. Therefore, a tool is required to manage different models at various stages and the lifecycle.

MLflow is an open-source platform for the complete machine-learning lifecycle that allows you to manage your end-to-end machine-learning workflow. It was developed by Databricks and is now a Linux Foundation project. MLflow provides a number of tools to assist with various aspects of the machine learning workflow. By using MLflow, you can streamline the machine learning development process and make it easier to collaborate with others, whether you’re working on a small personal project or a large-scale enterprise project.

MLflow Concepts

Let’s briefly look at the concepts of Mlflow that help us to manage and monitor the different phases of our ML model’s lifecycle.

1. Tracking — Mlflow tracking allows us to log, store and visualize the model runs and results. The model runs include the model files, data, and parameters. This helps us to keep track of the runs and compare the results. With this comparison, we could make an informed decision on which model is the best.

2. Project — Mlflow also provides us with a way to organize your ML code and files in a structured manner. So, the next time we need to run the same model, we have the same environment set up with all the required dependencies and parameters.

3. Model — MLflow models provide a simple API to load machine learning models. This makes it convenient for the engineers to version and deploy these models to production.

4. Registry — Since Mlflow provides shared storage of all models, versions, data, and experimental artifacts it makes it easier to compare, deploy and collaborate with others working on the same model.

All these concepts provide an exhaustive MLOps platform for managing the ML lifecycle. It eases tracking, reproduction of the repository for all members of the team, and deploying different versions of a model. With the help of MLflow, we reduce time and effort to streamline the model development process and bring it to production.

Traditional methodology vs MLflow

Before MLflow, machine learning workflows were managed through ad-hoc methods. These methods had several limitations:

Lack of Experiment Tracking: In traditional methods, experiments are often performed manually, and results are recorded in spreadsheets or custom scripts. This makes it difficult to track the exact parameters, code versions, and metrics of each experiment, leading to difficulties in reproducing results.
No Standardization: There are different tools, scripts, or methods to perform among the teams, leading to inconsistencies in the way that models are built and evaluated. As the result, it increases the likelihood of error and bias in results, making it challenging to achieve reliable and consistent results.
Lack of Collaboration: Traditional methods of machine learning lack collaboration features, making it challenging for team members to work together on the same project and reproduce each other’s results.

In contrast, MLflow offers several advantages over the traditional method, including:

Shared Workspace: MLflow provides a centralized workspace in the form of a web-based interface and REST API, which allows team members to share parameters, models, and metrics, compare results between different experiments, and keep track of who made the changes.
Deployment Tracking: MLflow provides several methods for deployment tracking, which help to ensure that the proper version of a model is deployed in production and that the model’s performance can be monitored over time. MLflow includes a Model Registry that stores and manages different versions of models and tracks their lineage, including the parameters and metrics used to train them. It allows users to easily compare different versions of models and deploy the proper version in production. MLflow also provides integration with popular cloud services like AWS SageMaker, Google Cloud AI Platform, and Microsoft Azure Machine Learning, allowing the model to be deployed directly into these services. Additionally, MLflow has a REST API that allows developers to programmatically deploy models, making it easier to automate the deployment process.

Quick Start on Local

Installing MLflow locally

If you are using Windows or Linux-based platform, you can install MLflow by running:

pip install mlflow

If you want to install extra ML libraries and tools, you can also run:

pip install mlflow[extras]

Once you are done with the installation, you can test if the MLflow is installed properly by running:

mlflow ui

It should run a server locally with port 5000 (http://localhost:5000). Once you connect to the local host, you will see something like this:

Now, let’s fill it up!

Working on the model

While working on an ML model, we need to keep track of some key components like — model versions, datasets, hyperparameters/parameters, evaluation metrics, and output (trained) model files. Using MLflow, we can store and track all these components with just a few lines of code. Here’s an extract from our codebase for logging parameters, metrics, and the model to your MLflow dashboard.

# log parameters
mlflow.log_param('leaf_nodes',params['max_leaf_nodes'])
mlflow.log_param('max_depth',params['max_depth'])

# log metrics
mlflow.log_metric('test_accuracy' , test_accuracy)
mlflow.log_metric('test_f1_score', test_f1_score)

# log model
mlflow.sklearn.log_model(clf, "model")

For this demo, we have decided to train a decision tree model on the iris dataset. We used the DecisionTreeClassifier() function of sklearn, added some parameters, and did a Grid Search cross-validation to select the best parameters. Evaluation of the model was done with sklearn metrics’ accuracy score and f1 score. Here’s the code for this implementation:

no_of_runs = 10
X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y, test_size=0.25, random_state=10
)
count = 1
for i in range(no_of_runs):
    for j in range(no_of_runs):
        with mlflow.start_run(run_name=f"Run #{count}"):
            count += 1
            # the minimum value of max_leaf_nodes is 2 and max_depth is 1
            params = {
                "max_leaf_nodes": i + 2,
                "max_depth": j + 1,
            }
            clf = DecisionTreeClassifier(
                random_state=42,
                max_leaf_nodes=params["max_leaf_nodes"],
                max_depth=params["max_depth"],
            )
            # training
            clf.fit(X_train, Y_train)
            Y_pred = clf.predict(X_test)
            test_accuracy = metrics.accuracy_score(Y_test, Y_pred)
            test_f1_score = metrics.f1_score(Y_test, Y_pred, average="weighted")
            test_metrics = (test_accuracy, test_f1_score)
            # log parameters
            mlflow.log_param("leaf_nodes", params["max_leaf_nodes"])
            mlflow.log_param("max_depth", params["max_depth"])
            # log metrics
            mlflow.log_metric("test_accuracy", test_accuracy)
            mlflow.log_metric("test_f1_score", test_f1_score)
            # log model
            mlflow.sklearn.log_model(clf, "model")

In this snippet, we see that adding those few lines of code we talked about earlier to track parameters, metrics, and load model, was really that simple.

Running this code will automatically create a directory locally called mlruns in your working directory. All the information about the model runs is logged here.

What exactly IS on the MLflow dashboard?

Once you refresh your http://localhost:5000, you will notice that a bunch of model runs is displayed. Along with it, you can have a brief overview of the time it took to complete the run, the saved model files, the accuracy scores, and the selected parameters.

But, we could do more than just have a look at these numbers. In this section, we will talk about some examples of the different insights that can be drawn after deploying our model to MLflow.

Since we did multiple runs, it makes sense to pick the runs for which we had the best results and compare them. Here’s how we could do this:

Parallel Coordinates Plot visualization for the model runs

With this parallel coordinate plot, we were able to observe that every run we selected is reaching to the highest accuracy, and this optimum result was attained by setting the parameter leaf_nodes=4

Next, we filtered all those runs where the parameter leaf_nodes=4 and then compared only those models:

filtering only model runs with parameter leaf_nodes=’4’

In this Parallel Coordinates Plot, we could observe an anomaly! One counter case (depth = 2) is not reaching the highest score. So, this informed us of the suitable max_depth range that gives the best results.

MLflow deployment in AWS

With the local setup described above, individual developers such as yourselves can try out MLflow and gain a better understanding of its capabilities. In this section, we would like to show how MLflow could be deployed in a production-like environment, where it will be used as a standard platform for teams to collaborate on building, training, testing and deploying ML models.

Specifically, we will carry out the following steps:

Prepare MLflow Container Image in AWS Elastic Container Registry (ECR).
Set up AWS Relational Database Service (RDS) instance for MLflow data store and AWS Simple Storage Service (S3) bucket for MLflow artifact store.
Deploy MLflow as container(s) in AWS Elastic Container Service (ECS).

Prepare MLflow Container Image

Since we will host MLflow on AWS ECS, a serverless container service, we need to prepare an MLflow container image to be deployed. We will use the official image, and add some customization to install needed driver libraries for MySQL and configure the correct entry point.

# Build command:
# docker build -f Dockerfile -t mlflow:v2.1.1 .

FROM ghcr.io/mlflow/mlflow:v2.1.1

RUN pip install pymysql boto3

CMD mlflow server \
  --host 0.0.0.0 \
  --serve-artifacts \
  --backend-store-uri mysql+pymysql://${SQL_USERNAME}:${SQL_PASSWORD}@${SQL_HOST}:${SQL_PORT}/${SQL_DATABASE}  \
  --default-artifact-root s3://${S3_BUCKET} \
  --artifacts-destination s3://${S3_BUCKET}

AWS Elastic Container Registry (ECR) stores container images that can be later pulled and used in ECS (or other container environments, like Elastic Kubernetes Service — EKS). We will push the built image from the previous step to ECR.

Follow Creating a public repository for instructions on creating a repository and Authenticate to your default registry for instructions on authenticating your Docker CLI with AWS ECR. After that, we are ready to push!

docker tag mlflow:v2.1.1 public.ecr.aws/<repo-id>/mlflow:v2.1.1 .
docker push public.ecr.aws/<repo-id>/mlflow:v2.1.1

Note: If preferred, you can also push the image to Docker Hub or any trusted public, Internet-accessible container registry.

Create an MySQL instance

We are also creating an AWS RDS instance that will serve as a back-end store for MLflow. Instead of writing to the local file system in the case of a local setup, we will configure MLflow to use a proper database to store its logged data. Advantages of using an RDS instance include scalability and reliability. Automatic database backups can be scheduled to avoid loss of our valuable ML experiment data.

For the purpose of this demo, we choose the MySQL engine option with Free Tier template with all default options, except Public Access should be set to Publicly accessible. Do not forget the master password as you will need to create a database for MLflow in the later step.

Note: To avoid connection issues, make sure the instance has a public IP address and is assigned to a VPC Security Group that allows inbound traffic from your machine.

Now we can connect to the instance and create a database and user reserved for MLflow.

CREATE DATABASE IF NOT EXISTS mlflow;
CREATE USER IF NOT EXISTS 'mlflow'@'%' IDENTIFIED BY '<mlflow_user_password>';
GRANT ALL PRIVILEGES ON mlflow.* TO 'mlflow'@'%';
SHOW GRANTS FOR 'mlflow'@'%';

Create an S3 bucket

An S3 bucket is a suitable Blob storage for arbitrary artifacts that can be generated by our ML pipelines and logged to MLflow. In particular, MLflow will store the logged model files in the configured S3 bucket, as well as any extra artifacts we decide to store during our ML runs.

Go ahead and create a bucket, and note down its URI as we will need to refer it in MLflow in the next step.

Prepare an ECS Task Definition

In AWS ECS, a task definition allows us to specify options on how ECS should run a container. These options include the container image, environment variables, port mappings, required resources (CPU cores, memory, storage volumes), etc…

Navigate to AWS ECS > Task definitions. We will create an mlflow task definition, as specified below.

Task definition family: mlflow
Image URI: URI of the MLflow image pushed in the previous step
Container port: 5000 (the tracking server default port)
Environment variables: key in MySQL and S3-related variable values that are created from the previous steps.
Task role: select an IAM role that allows access to S3 (ECSFullS3 in this case, you might need to create a role if there does not exist one).

Create an ECS Cluster

As misleading as its name may sound, an ECS Cluster is not a group of actual VMs that run containers, but rather a logical grouping of ECS Services and Tasks. We will create a cluster that runs on AWS Fargate — AWS’s own serverless container platform.

Cluster name: MLflowCluster, an arbitrary cluster name
Networking: select the VPC and subnet(s) in which cluster resources will reside.
Infrastructure: select only AWS Fargate (serverless).

Deploy an ECS Service for MLflow

Phew, that was lots of steps just for preparation. Now that we have both the stores (RDS and S3) as well as ECS Task Definition and a Cluster, we can proceed to create an ECS Service to start actual container(s) that run our MLflow tracking server.

Navigate to AWS ECS > Task definitions and select the mlflow task definition created previously.

Click the Deploy dropdown button and select Create service.

Existing cluster: select MLflowCluster.
Compute configuration: select Launch type, with FARGATE as launch type value.
Deployment configuration: enter mlflow-svc under Service name and keep other default options.
VPC and Subnets: select the same VPC/subnets as in ECS Cluster creation step.
Security group name: keep the default security group.
Public IP: ensure it is enabled.
Load balancing: select Application Load Balancer.
Load Balancer name: enter mlflow-lb.
Target group name: enter mlflow-tg.

Configure Load Balancer

The Load Balancer (LB) for MLflow service is created together with the service itself in the previous step. We however need to configure its VPC Security Group to include a firewall rule allowing public access.

Navigate to AWS EC2 > Load Balancers, you should see mlflow-lb. Select it.

Under the Details panel, you can identify the public DNS address of the LB, this is the URL of our MLflow service.

Under the Security tab, the default security group is added by default. Click Edit > Create new security group.

We will create a new Security Group called mlflow-lb-sg, with the following firewall rules. After creation, the LB can now be assigned to mlflow-lb-sg, allowing access to it from the public Internet.

Inbound: Allow public access to the LB on TCP port 80.
Outbound: None

All done! Now try hitting the LB address in your browser. Voila 🙌 🙌, now we have a remote MLflow server that can be shared with the team!

Integrate MLflow with AWS SageMaker Pipeline

SageMaker is AWS’ solution for fully-managed Machine Learning service. It provides features that help data science teams develop, train, test and deploy ML models. SageMaker Pipeline allows data scientists to focus on building complete Continuous Integration/Continuous Delivery (CI/CD) pipelines for their ML models.

MLflow and SageMaker Pipeline complement and integrate nicely to provide a complete solution for building, tracking, monitoring, and deploying ML models. In this section, we will use the same Scikit Learn Iris model to illustrate how a SageMaker Pipeline can be created and integrated with MLflow, specifically we will:

Create a SageMaker Pipeline to build/train Iris models with hyper-parameter optimization.
Add MLflow tracking capabilities into the pipeline to enable visibility on model development progress, including run parameters, scores, and trained models.
Deploy selected models in MLflow Model Registry to SageMaker Endpoints to run future inferences.

Pre-requisite: Creating SageMaker Project

Technically speaking, an individual SageMaker Pipeline instance can be created and run programmatically using AWS SageMaker SDK, even from a Jupyter Notebook. This allows data scientists to set up a Pipeline for their quick experiments. However, in production, Pipelines should be created and maintained as a part of SageMaker Projects. To work on our Pipeline, we first need to create a Project following the instructions.

SageMaker — Project components: Repositories, Pipelines, Experiments, Model groups, and Endpoints.

After the project is created using MLOps template for model building, training, and deployment, we observe two AWS CodeComit repositories are created, model-build and model-deploy. Aptly named after their purposes, model-build will serve as the main repository for our ML code and will also host utilities to trigger model-building pipeline executions, while model-deploy will include code for model-deploying pipeline, by default to SageMaker Endpoints.

More importantly, the repositories by default are integrated with AWS EventBridge and CodePipeline to allow automatic execution of pipelines upon code commits to the main branch. This helps us achieve GitOps principles and make it easier to manage the CI/CD of our projects.

EventBridge — Default rules created for SageMaker Project

EventBridge — Event definition to trigger SageMaker model-build Pipeline upon commit to “*main” branch*

The model-build pipeline (aka the CI part)

Note: refer to the Github repos build and deploy for the example code discussed in this section.

The model-build pipeline, integrated with MLflow

In pipeline.py, we define a two-step build pipeline. In the first step — PrepareIrisData, we simply prepare the Iris train and test datasets. In the second step — IrisTuning, we make use of SageMaker HyperparameterTuner to train Iris models and pick one that maximizes the accuracy score.

SageMaker Pipeline to train SKLearn Iris model

estimator = SKLearn(
    entry_point='train.py',
    source_dir=os.path.join(BASE_DIR, 'source_dir'),
    role=role,
    metric_definitions=metric_definitions,
    hyperparameters=hyperparameters,
    instance_count=1,
    instance_type=training_instance_type,
    framework_version='1.0-1',
    base_job_name=f"{base_job_prefix}/sklearn-iris-train",
    sagemaker_session=pipeline_session,
    disable_profiler=True
)

hyperparameter_ranges = {
    'max-leaf-nodes': IntegerParameter(2, 3),
    'max-depth': IntegerParameter(2, 3),
}

objective_metric_name = 'accuracy'
objective_type = 'Maximize'

hp_tuner = HyperparameterTuner(
    estimator=estimator,
    objective_metric_name=objective_metric_name,
    hyperparameter_ranges=hyperparameter_ranges,
    metric_definitions=metric_definitions,
    max_jobs=4,
    max_parallel_jobs=4,
    objective_type=objective_type,
    base_tuning_job_name=f"{base_job_prefix}/sklearn-iris-tune",
)

Notice the entry point to the model training code — train.py — this is where we plug in MLflow to start tracking model training runs. train.py is triggered with additional MLflow arguments and log data to remote MLflow server. The code itself is very similar to that from the local set-up, except an additional line to set up the remote tracking service URI.

parser = argparse.ArgumentParser()

# MLflow related parameters
parser.add_argument("--mlflow-tracking-uri", type=str)
parser.add_argument("--mlflow-experiment-name", type=str)

# set remote mlflow server
mlflow.set_tracking_uri(args.mlflow_tracking_uri)
mlflow.set_experiment(args.mlflow_experiment_name)
.
.
# Start MLflow run
with mlflow.start_run():
    params = {
        "max-leaf-nodes": args.max_leaf_nodes,
        "max-depth": args.max_depth,
    }
    # Log used params
    mlflow.log_params(params)

    # Log additional tags to identify runs
    mlflow.set_tag("commit", args.source_commit)
    mlflow.set_tag("trigger", args.source_trigger)
    .
    .
    .
    # Log metrics
    mlflow.log_metric('accuracy', test_accuracy)
    mlflow.log_metric('f1', test_f1_score)
    .
    .
    # Log model with conditional registration
    if test_accuracy > 0.9:
        result = mlflow.sklearn.log_model(
            sk_model=classifier,
            artifact_path='model',
            registered_model_name=args.mlflow_model_name,
        )
    else:
        result = mlflow.sklearn.log_model(
            sk_model=classifier,
            artifact_path='model',
        )

As expected, when the model-build pipeline starts running (upon a code push to the main branch), we can see MLflow UI starts reporting data for these runs.

MLFlow — run params, metrics and model logged from SageMaker model-build pipeline

As can be seen from the above screenshot, we also apply conditional registration for trained models whose accuracy scores are above a threshold (0.9). The registered models can be further inspected under the Model tab on MLflow UI. The conditions to include a trained model in the model registry can be further extended to include multiple evaluation scores, or to create multiple tiers of model quality depending on their scores.

MLflow — Model registry with model registered from SageMaker Pipeline.

The model-deploy pipeline (aka the CD part)

After qualified models are put into MLflow Model Registry, the next step is to decide which model to be deployed for use in future inferences (predictions). In SageMaker, models are typically served via SageMaker Endpoints, whose deployment process is automated with the help of model-deploy pipeline.

As the trained MLflow models are already registered in the Model Registry at the end of the model-build cycle, model-deploy pulls the model files from the Registry and deploy it to SageMaker Endpoint. The decision to pick which registered model version to be deployed lies with the Data Science team, upon their own inspection and evaluation. The selected version can be put in the buildspec.yml of model-deploy repository. Upon a commit push, the rest is handled automatically through the CodePipeline workflow.

pre_build:
  commands:
    - TRACKING_URI="http://mlflow-lb-<random-lb-id>.us-west-2.elb.amazonaws.com"
    - MODEL_NAME="sagemaker-mlflow-iris-model"
    - MODEL_VERSION="2"

The model-deploy pipeline, with MLflow integration

Once the pipeline finishes, we can observe a SageMaker Endpoint is deployed under the Project. We can then send some data to this endpoint and get back some class predictions. Let’s try that now.

SageMaker — MLflow model deployed to SageMaker Endpoint

Here we are sending a list of four feature sets (Sepal Length, Sepal Width, Petal Length, Petal Width) and getting back the classification of Iris type according to our trained model — 0 for Iris Setosa, 1 for Iris Versicolour and 2 for Iris Virginica. Our end-to-end CI/CD for model building, training, and deployment with MLflow integration is now complete 👏👏👏.

SageMaker — Getting inferences from Model Endpoint API

Conclusion

MLflow is a powerful and modern MLOps tool that offers significant advantages over traditional methods for data project collaboration. With its centralized workspace, model registry, deployment integrations, and REST API, MLflow provides a streamlined and efficient way to manage the entire machine learning lifecycle. Whether you are starting a new machine learning project or looking for a more efficient way to manage your existing projects, MLflow is a modern MLOps tool that is worth considering.

References

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
https://mlflow.org/docs
https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html
https://github.com/sofianhamiti/amazon-sagemaker-pipelines-mlflow
https://www.databricks.com/product/managed-mlflow#:~:text=What%20is%20Managed%20MLflow%3F,enterprise%20reliability%2C%20security%20and%20scale
https://www.linuxfoundation.org/press/press-release/the-mlflow-project-joins-linux-foundation