Censius
Published in

Censius

How to use MLflow to Track and Structure Machine Learning Projects?

Machine Learning is an expensive, experimental process. Every step has to be meticulously planned, and every input is required to have a meaningful effect on the output. The process can take weeks or months to complete, but once it gets rolling, there are no signs of stopping it for a good reason: Machine Learning algorithms are by nature unpredictable and can change course at any moment because they are “trained” on new data points that weren’t present before.

MLflow is one of the tools explicitly designed to enrich the processes of Machine Learning algorithms so that they will not only produce quality work but also be created efficiently and economically.

What is Experiment Tracking, and Why is it important?

Experiment tracking is the technique of keeping track of relevant information about various experiments undertaken while creating a machine learning model. For Example :

  • Various ML models
  • Hyperparameters
  • Configuration files for the environment
  • Data versions used for training and evaluation
  • Performance visualizations and a lot more

Data scientists can discover the elements that impact model performance, compare the findings, and choose the best version by tracking ML model trials in an organized way.

Collecting and preparing training data, picking a model, and training the model using prepared data are all common steps in developing ML model. A slight change in the training data, hyperparameters, model type, or code used to experiment can significantly impact model performance. Many open-source and enterprise-level MLOps tools and platforms are available to assist you in tracking your machine learning experiments. MLflow is a popular open-source tool used by many data scientists and ML engineers.

What is MLflow?

MLflow is an open-source tool used in machine learning to help developers and data scientists better understand and interact with their data. It allows you to manage the entire machine learning lifecycle — experimentation, reproducibility, deployment, and model registry. Let’s look at some of the features MLflow offers before moving on to the key components.

  • It is compatible with a wide range of machine learning frameworks, languages, and code.
  • It packs an ML model in a common format that downstream programs may utilize.
  • It’s a model store with APIs, and a user interface for managing the MLops Lifecycle.
Graphic shows Key Components of MLflow (Source: medium.com/pytorch)

MLflow Tracking

When executing your machine learning code, the MLflow Tracking component offers an API and UI for recording parameters, code versions, metrics, and output files, as well as viewing the results later. MLflow Tracking uses Python, REST API to log and query trials.

It assists in keeping track of the various number of experiments and iterations on the data. It also helps in obtaining various hyperparameters, characteristics, and analyses for a certain iteration.

Some important function of MLflow Tracking:

MLflow.start_run() -- starts/executes a run.
MLflow.end_run() -- ends a currently active run.
MLflow.log_artifacts() -- logs all the files given in a directory as artifacts.
....

MLflow Projects

An MLflow Project is a convention-based framework for packaging data science code in a reusable and repeatable workflow. The Projects component also offers an API and command-line utilities for executing projects, allowing you to create workflows by chaining projects together.

MLflow supports different types of environments like Docker container environment, system environment, and Conda environments.

The image represents how MLflow Projects works: Image Source: infoq.com

MLflow Models

A machine learning model is packaged as an MLflow Model, which may be utilised in several downstream tools, such as real-time serving over a REST API or batch inference on Apache Spark. The format establishes a standard that allows you to store a model in many “flavors”. MLflow makes it easy to package models from various popular machine learning libraries in MLflow Model format, with tons of customization options.

Model Registry

The MLflow Model Registry component provides centralized model storage, API set, and UI for jointly managing an MLflow Model lifecycle. It includes model lineage, versioning, and annotations.

Recommended Reading: Data Version control: MLflow vs DVC

It provides excellent governance and control. You can use CI/CD Workflow Integration to track stage transitions, analyse changes, and approve them.

Recommended Reading: MLflow Tracking Docs

Benefits Of Using MLflow

Let’s take a look at some of MLflow’s benefits.

  • It is an Open Source MLOps tool.
  • Supports many Tools and Frameworks
  • Highly Customizable
  • It’s ideal for data science projects.
  • Focuses on the entire Machine learning lifecycle.
  • Works with any ML library.
  • Custom Visualization

Let’s look at how you can use MLflow to keep track of your machine learning and deep learning projects.

Recommended Reading: MLflow Best Practices

Tracking ML Experiments using MLflow

We will discuss the basic integration process of MLflow in your machine learning application/project. Let’s have a look at how you can use the MLflow UI to visualize your data.

UI Workflow

Installing MLflow :

pip install MLflow

Now, open your machine learning project/ML pipeline code file.

First import :

import MLflow
import MLflow.sklearn

Now, name the experiment you are going to track.

MLflow.set_experiment(experiment_name="MLflow demo")

You have to specify what you’re going to track.

MLflow.log_metric("accuracy", model_accuracy) //metric logging
MLflow.log_metric("precision", precision) //metric logging
MLflow.sklearn.log_model(model, "model") //model loggingMLflow.log_param("max_depth", max_depth) //hyperparameters logging
...

Now open the command prompt, and write:

MLflow ui

You will get a similar outcome — “Serving on http://127.0.0.1:5000”.

The image shows MLflow UI

To Learn More, Download the quickstart code by cloning MLflow via git clone and cd into the examples subdirectory of the repository.

API Workflow

MLflow provides a more detailed Tracking Service API for tracking experiments and runs directly, which is accessible via the MLflow.tracking module’s client SDK. This allows you to search for data from previous runs, log extra information about them, create experiments, tag runs, and more.

from MLflowf.tracking import MLflowClient

After importing MLflowClient, define a few parameters.

client = MLflowClient()
experiments = client.list_experiments() # returns a list of MLflow.entities.Experiment
run = client.create_run(experiments[0].experiment_id) # returns MLflow.entities.Run
client.log_param(run.info.run_id, "hello", "world")
client.set_terminated(run.info.run_id)

You can get more info on MLFlow’s Github example repo

Recommended Reading: Be more efficient to produce ML models with MLflow

Some Highlights of MLflow

The MLflow API is well-designed, and new features are released regularly. It’s important to keep up with new features and updates by monitoring the API. However, I’d like to draw attention to a few noteworthy characteristics of MLflow.

  • MLflow includes auto-logging. It is incredibly easy to use, and simply activating it assures that all potential metrics are captured and logged. Keras, Tensorflow, XGBoost, and Spark all have Autolog support.
  • A number of task orchestration platforms are available, but MLflow is designed particularly to enhance the machine learning lifecycle. This means that MLflow can conduct experiments and track their outcomes, as well as train and deploy machine learning models.
  • Deep learning models benefit from auto-logging. As we all know, during the training of a Deep Learning model, multiple parameters/hyper-parameters are captured.
  • With MLflow, you can customize it to meet your specific requirements. It can also handle massive volumes of data
  • MLflow API supports not just Python but also Java and R programming languages.
  • It is open-source, so you can get good community support.
  • It may be used to deploy various machine learning models, which can be saved as a directory with any number of files in it.
  • With MLflow, data scientists will no longer need to manually monitor the parameters they use in each run.

Conclusion

We’ve seen MLflow’s potential and learned how it can help you with experiment tracking and monitoring. We also discussed what MLflow is and how it can help you in your machine learning lifecycle. MLflow can provide a strong method for tracking model, packaging, and repeatability with only a few lines of code. In the machine learning arsenal, this is a must-have tool.

--

--

--

Censius is an AI observability platform that continuously monitors models, analyzes their performance, and provides explainability so that businesses derive better AI outcomes.

Recommended from Medium

Current for Realtime Machine Learning

Knowledge Distillation with Haystack

Quantize Your Deep Learning Model to Run on an NPU

Codeq NLP API Tutorial

Debugging Haystack Pipelines

When machine learning meets complexity: why Bayesian deep learning is unavoidable

Self-attention 筆記(上)

Machines learning techniques explained for newbies.

Top 5 Machine Learning Projects for Beginners | Hacker Noon

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Harshil Patel

Harshil Patel

Software Developer and Technical Writer.

More from Medium

MLOps: 5 Machine Learning problems resulting in ineffective use of data

How to Record Activity in JupyterLab and Amazon Sagemaker Studio

Five Reasons Why Companies Have To Adopt MLOps In 2022

Applying Machine Learning for A/B Testing: Ad Campaign Performance