Managing ML Training Models using ModelDB

Version, Track, Collaborate On, and Query Your Machine Learning Training Experiment Runs

Anuj Kumar
Jun 28 · 5 min read

Building and improving a machine learning (ML) training model’s performance is an iterative process and often involves trial and error. Some of the most common adjustments are changing the classifier, adding preprocessing steps, and adjusting hyper parameters.

For example, if 20–30 ML training model runs were done from a certain starting point, it can be challenging not to lose track of that starting point, what changes took place during the various runs, and what the impact of those changes was for each step. In short, it’s highly likely that key insights will be lost from the experiment runs performed without some level of management.

Thankfully, there is a tool that helps in solving these challenges — it’s ModelDB.

In this post, I’ll take a look at ModelDB along with the benefits and use cases it caters to as well as step through a sample implementation. To get the most value out of what I am sharing, I would recommend having a basic knowledge of ML concepts and familiarity with toolkits like scikit-learn, jupyter notebook, docker, etc. to understand the setup and sample code walkthrough below.

ModelDB — What is it?

ModelDB makes it easier to manage ML training models, and provides a capability to store, share, query, visualize and version those models which allows the tracking of model development. It is specifically helpful when debugging and tuning models, as it stores model metadata and metrics of all the experimental runs of the model which can be easily referred to and compared using visualization features provided by ModelDB.

It is best used for use cases such as:

  • Tracking Modeling Experiments
  • Versioning Models
  • Ensuring Reproducibility
  • Visual exploration of models and results
  • Collaboration

ModelDB Architecture

Source: https://mitdbg.github.io/modeldb

Setting Up ModelDB

The easiest way to get ModelDB going locally is to use docker-compose for Docker based setup. Just clone the GitHub repository ModelDB and inside the root folder of the repository run

docker-compose up

This will build docker images for all the required services and bring them up. Please refer to the docker-compose.yml file in the repository for more details.

Other setup options are described in detail in the documentation on GitHub.

What Does ModelDB Natively Support?

As of now, ModelDB has a native support for scikit-learn and spark.ml. The native support makes it really easy to track training metadata of the models without any significant change in the model code. It automatically captures and stores the metadata of all the training stages like transform, fit and predict in the background.

However, we lose the flexibility to structure and log metadata in a unique way for a specific use case. Adding to that, there is no native support for deep learning frameworks like TensorFlow Keras, but there is an option of using ModelDB’s light API to overcome these limitations.

ModelDB Light API

The ModelDB python client library provides APIs which can be used with any ML workflow to log model metadata and metrics. First things first — prepare the metadata, create syncer object and sync the data to ModelDB. Let’s see this in detail with the help of an example.

Face Recognition Example Walkthru

For this example demonstrates training a model to predict faces from The Labeled Faces in the Wild face recognition dataset using scikit-learn. More details are here.

Download the lfw data of sklearn dataset, if not already on disk and load it as numpy arrays.

from sklearn.datasets import fetch_lfw_people

Split data into training and test sets.

from sklearn.cross_validation import train_test_split

Compute a PCA on the face dataset for dimensionality reduction.

from time import time
from sklearn.decomposition import RandomizedPCA

Train an SVM classification model.

from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV

Do a quantitative evaluation of the model quality on the test set.

y_pred = clf.predict(X_test_pca)

The model is trained and we have computed some evaluation metrics for it. Let’s sync these to ModelDB using Light API.

Initialize the ModelDB Syncer object.

from modeldb.basic.ModelDbSyncerBase import *

Initialize dataset object with data and metadata and store in ModelDB — Create the Model, ModelConfig, and ModelMetrics instances.

data_folder_path = “/Users/anuj/scikit_learn_data/lfw_home/lfw_funneled”

Sync the data to ModelDB.

syncer_obj.sync_datasets(datasets)
syncer_obj.sync_model(“train”, model_config, mdb_model)
syncer_obj.sync_metrics(“test”, mdb_model, model_metrics)

That’s it! You have now successfully trained a model for recognizing faces from the lfw dataset and tracked the model’s metadata and metrics in ModelDB. The complete code sample is available here.

Tune and Retrain the Model

Adjust the model parameters and retrain the model to achieve a better accuracy score — each experimental run will sync the corresponding data to ModelDB. The ModelDB UI allows exploration and comparison of the metrics and metadata for all the runs.

Screenshots are below…

Conclusion

I hope you’ve enjoyed this walkthru of ModelDB — for exploring it in more detail, go ahead and play around with the ModelDB visualizations.

In general, I’ve found that ModelDB has a very useful feature set and it’s relatively easy to configure and use, but it is still evolving, and will continue to improve. In the near future, I hope to see native support for additional ML toolkits added to ModelDB.

HashmapInc

Innovative technologists and domain experts helping accelerate the value of Data, Cloud, IIoT/IoT, and AI/ML for the community and our clients by creating smart, flexible and high-value solutions and service offerings that work across industries. http://hashmapinc.com

Anuj Kumar

Written by

HashmapInc

Innovative technologists and domain experts helping accelerate the value of Data, Cloud, IIoT/IoT, and AI/ML for the community and our clients by creating smart, flexible and high-value solutions and service offerings that work across industries. http://hashmapinc.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade