Tracking Experiments with MLflow on Databricks

Follow the best practices and surprise yourself with the time saved

Published in

OCTAVE — John Keells Group

7 min readFeb 17, 2021

Surprise yourself with the time saved! (Photo by Pixabay on Pexels)

Model Development is arguably the most exciting step in the Data Science life cycle. However, as Data Scientists, we often end up in an iterative process during Model Development, optimizing the chosen metrics for the business problem’s success. Experimentation is vital when finding the best solution alone is insufficient. We also need to outline the process undertaken to arrive at the best performing solution and ensure the solution is reproducible.

While experimenting is inevitable, we can always have a definitive effective mechanism to track experiments and the models during the data science life cycle.

The Obvious Choice: Databricks and MLflow

At OCTAVE, we deal with data that is high in volume and variety. Our Data Engineers outlined the entire architecture and ensured a smooth workflow for the business and the data science team. As data scientists, we enjoy explorative programming and notebooks. What better choice than Databricks, which unifies all our requirements into one! If you have not heard of Databricks, it is a unified data analytics platform on Microsoft’s cloud for massive-scale data engineering and collaborative data science.

Now that we have some context of our platform of choice let us get down to the problem at hand. Before we explored tools to track experiments, we listed down our requirements. Some of the necessities we had were,

Minimal changes to the existing code.
Works well with our existing platform — Databricks.
Platform independent — we like to reduce dependencies as much as possible.
Preferably open-source and freely available.

MLflow met all these requirements and actually offered even more. Simply put, MLflow is an open-source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. Yes, you read that right, not just experiment tracking but the complete machine learning lifecycle. Some of the features they offer are,

MLflow Tracking — Tracking experiments to record and compare parameters and results.
MLflow Projects — Packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production.
MLFlow Models — Managing and deploying models from various ML libraries to various model serving and inference platforms.
MLFlow Model Registry — Central model store to collaboratively manage an MLflow Model’s full lifecycle, including model versioning, stage transitions, and annotations.

Though MLflow has all these advanced capabilities, in this article, we will focus on MLflow Tracking and how it can help us track our ML experiments in an organized manner.

This article also assumes you already have set up your clusters on Databricks; if not, you can easily look for the official documentation instructions to do the same. Selecting the Databricks Runtime for Machine Learning while creating the clusters pre-installs all the widely used machine learning packages on the cluster, including MLflow. Either way, you can easily install it from PyPI.

With those pre-requisites out of the way, we will first try to understand an existing simple workflow and then integrate MLflow into it! And we are going to do it with minimal changes in code and within minutes.

Understanding a Simple Workflow

Iris flower dataset. (Photo by Zdeněk Macháček on Unsplash)

What better way to understand the typical workflow than a walkthrough with a well-known example? We will use the famous iris flower dataset and build a classification algorithm to predict the iris plant’s class type. Well, no particular reasons for choosing this dataset, other than the fact that it is readily available in the scikit-learn library, and you can try it instantly.

Please note that this example is only for explanatory purposes, and problems in the real world can get complex and challenging. Needless to say, the process remains the same regardless of the nature of the dataset or the problem.

Let us dive into the code.

A breakdown of the simple code for the classification problem.

Lines 1–4: we import all the required libraries.
Lines 6–12: we define a function that loads the iris data, makes the train test split, and returns it.
Lines 14–24: we define a function that takes the data and the model parameters as input and builds a decision tree classifier, does prediction on test data, evaluates the accuracy and the F1 score. This function finally returns the model and the metrics that were evaluated.
Lines 26–28: sets the parameters for the model, loads the data by calling the load_iris_data()function and passes the data and parameters to the train_predict_evaluate_dtree() function.

With this old workflow, if we had run multiple experiments to find the best performing model, there was no option but to let the functions run in a loop and manually enter these parameters and metrics in an excel sheet. Everyone understands that this would be inefficient. Now that we have gone through an existing model development workflow, let’s understand how we can integrate MLflow to this workflow and fix it for the better.

Integrating MLflow in Minutes

We have promised minimal changes throughout the article. In our example, it is just 6 extra lines of code. The team was pleasantly surprised when we discovered this, and now there is no way of going back.

Let us dive into the code.

What has changed?

Line 5: We import the mlflow library
Line 6: Here, we import the relevant mlflow.sklearn module as well. This entirely depends on which package the model is built on. Some alternatives are mlflow.tensorflow , mlflow.pytorch , mlflow.xgboost and so on. The complete list of available modules can be found in the official MLflow Python API documentation.
Line 7: Autologging is a recently introduced experimental feature that makes the MLflow integration hassle-free. This function automatically logs all the parameters, metrics, and saves the model artifacts in one place. This enables us to reproduce a run or retrieve already trained model files for later use. It is a simple feature to use and definitely worth looking at the documentation.
Line 18: MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code. Suppose you want to find the best parameters to feed into the classifier model. You create multiple runs with varying parameters to find out which run gave you the best results. So what this line essentially does is to start a run using the mlflow.start_run() function, so that you can associate everything you log to that particular run.
Lines 28–29: autolog() currently logs the metrics that are associated with the training, such as train_accuracy only. However, in our example, we’d like to compare other test metrics such as test accuracy and test F1 score as well. Hence we make use of the log_metric() function that enables logging of any custom metrics.

Whether you do multiple runs or experiments is totally up to you. You can choose to run the code multiple times when needed or execute multiple runs at once. Something simple would be to create a small loop with the parameters you want to experiment with and run it all at once. It looks something like this.

for params in different_combinations_of_parameters:
    model,metrics = train_predict_evaluate_dtree(input_data, params)

And we are good to go!

A Dashboard to Track All Your ML experiments

Your team can access everything if you have logged through the MLflow UI server. Click Experiment at the upper right of the notebook. The Experiments sidebar appears. This sidebar displays the parameters and metrics for each run of this notebook. Click the expand icon for a full-blown dashboard on a new browser tab. And voila, every experiment you run, it will appear here!

The MLflow UI Dashboard has all experiments in one place. (Screenshot by Author)

We can compare the experiments to pick the best model. (Screenshot by Author)

You can compare between several runs, visually analyze how a change in parameters changes the metrics, and even download the data as a CSV file in case you need to submit it as a report. The model files and the YAML environments are logged as artifacts for you to use later. Take your time to navigate through the dashboard. It is straightforward and beneficial.

You might have to spend a bit of time going through the MLflow concepts to understand more about experiments, runs, artifacts, auto logging, and other features, but eventually, you will get the hang of it. We roughly spent about a week (mostly on researching and exploring the features) when we first integrated it into our workflow, but now our data science team does it within minutes.

MLFlow definitely has made us follow best practices and saved us a lot of time. We hope you can upgrade your workflow too!

Written by Arunn Thevapalan, Senior Data Scientist at OCTAVE.