Building a fully reproducible machine learning pipeline with Comet.ml and Quilt

Classifying fruits using a Keras multi-class image classification model and Google Open Images

Cecelia Shao
May 13, 2019 · 9 min read

This post was written in collaboration with from the Quilt Data team. and his personal website . Follow Quilt

The term machine learning ‘pipeline’ can suggest a one-way flow of data and transformations, but in reality, machine learning pipelines are cyclical and iterative. For a given project, a data scientist can try hundreds and thousands of experiments before arriving at a champion model to put in production.

With each iteration, it becomes harder to manage subsets and variations of your data and models. Keeping track of which model iteration ran on which dataset is key to reproducibility.

Having a proper machine learning pipeline that details can not only help you easily reproduce your own model results, but also allow you to share your work with fellow data scientists or machine learning engineers who need to deploy your model.


In this article, we’ll show you how to build a simple and reproducible end-to-end machine learning pipeline using a Keras image multi-class classification model and a custom dataset crafted from Google Open Images using and

You can access the full tutorial in . For a walk-through of the tutorial, continue reading below ⬇.️

Creating your custom dataset

The Open Images Dataset is an attractive target for building image recognition algorithms because it is one of the largest, most accurate, and most easily accessible image recognition datasets. For image recognition tasks, Open Images contains 15 million bounding boxes for 600 categories of objects on 1.75 million images. Image labeling tasks meanwhile enjoy 30 million labels across almost 20,000 categories.

The images come from Flickr and are of highly variable quality, as would be realistic in an applied machine learning setting.

Downloading the entire Google Open Images corpus is possible and potentially necessary if you want to build a general purpose image classifier or bounding box algorithm. However downloading everything is a waste if you just want a small categorical subset of the data in the corpus. For this tutorial, we are just interested in downloading and working with fruit images.

Image for post
Image for post
View an interactive version of this plot on Quilt T4

The src/openimager subfolder in provided contains a small module that handles downloading a categorical subset of the Open Images corpus: just the images corresponding with a user-selected group of labels, and just from the set of images with bounding box information attached. Instead of using the zipped blob files it does so by downloading the source images from Flickr directly.

This script will allow you to download any subset of the 600 labels that do. Here’s a taste of what’s possible:

football, toy, bird, cat, vase,lemon, dog, elephant, shark, flower, furniture, airplane, spoon, bench, swan, peanut, camera, flute, helmet, pomegranate, crown

For the purposes of this article, we’ll limit ourselves to just fruit classes including:

apple, banana, cantaloupe, common_fig, grape, lemon, mango, orange, peach, pear, pineapple, pomegranate, strawberry, tomato, watermelon

For more information on Open Images, check out the article ‘’.

Preprocessing your data — and packaging it

in the does this work. After running the notebook code, we will have an images_cropped folder on disk containing all of the cropped images.

It’s easy to access the package of fruit class data along with the pre-processed images is via . In order to access the data, simply run this command:

! pip install t4t4.Package.install('quilt/open_fruit', registry='s3://quilt-example', dest='some/path/some/where')

Looking closely at the fruit data, we can see that there is a class imbalance. There are over 26,000 samples of bananas but then only a few hundred labelled common fig or pear examples. This skew is important to note as we approach building our image classifier.

Image for post
Image for post
View this plot on Quilt T4

Building your image classification model

Now that we’ve downloaded our fruit data from Quilt, we can begin building our image classification model! As with any machine learning project, we’ll go through a few experiments to try to maximize our model’s validation accuracy:

  • First we’ll start with a baseline simple convolution neural network (CNN) model.
  • Then, we’ll try to leverage a pre-trained network (VGG architecture, pre-trained on the ImageNet dataset) whose learned features can help us reach a higher accuracy more effectively than just relying on our fruits dataset. We’ll use transfer learning by fine-tune the top layers of our pre-trained network.
  • Finally, we’ll do a quick overview of different approaches for optimization including changing parameters like the amount of dropout, learning rate, and weight decay to see how they could contribute to model performance.

The material for this tutorial was inspired by Francois Chollet’s excellent post ‘’. We’ve expanded upon Chollet’s example and adjusted to reflect our multi-class classification problem space.

Along with having proper data versioning from Quilt, we’ll also make sure to track our results, code, and environment for our different model iterations as this is critical to building a reproducible machine learning model pipeline.

Note: We’ll be using Jupyter notebooks for this tutorial, but Comet.ml has native support for both and scripts.

Baseline model — Simple CNN

For our baseline model, we are using a small CNN with three convolution layers, using a ReLU activation, followed by a max-pooling layer. We’ll include data augmentation and fairly aggressive dropout to prevent overfitting. Remember, we’re not expecting our best accuracy here, so if you’d like to skip this section and go straight to the pre-trained model, simply proceed to the next section below.

Here’s the code plus for our small CNN model:

Not surprisingly, our simple CNN model did not perform that well on the multi-classification task (which puts us in a multi-dimensional space). The model was originally meant to support a binary classification task, so having more than three times the number of classes means trivially you need more nodes to get the same performance. Here are the metrics for one run of our model (link ):

Image for post
Image for post

To log your experiment results from training, set up your Comet.ml account . For each run of the model, we initialize the Comet experiment object and provide our API Key and project name.

Image for post
Image for post

Once you run model.fit(), you’ll be able to see your different model runs in Comet.ml through the direct experiment URL. As an example for this tutorial, we have created a Comet project that you can view and interact with .

Since we’re using Keras, Comet’s auto-logging for popular machine learning frameworks allows us to automatically capture model details such as metrics like accuracy and loss, the model’s graph definition, and package dependencies — this significantly reduces the amount of manual logging we have to do from our end.

Using a pre-trained model with transfer learning: InceptionV3

A popular starting point for building image classifiers these days is to use a pre-trained network and fine-tune it with new classes of data. Let’s use this approach to build our image classifier (just make sure to take note of ).

There are several popular CNN architectures such as VGGNet, ResNet, and AlexNet along with a wealth of resources to read more about CNNs (see and ). Keras enables users to easily access these pre-trained models (ie. their weights pre-trained on ImageNet) through .

We selected InceptionV3 since it’s both a smaller model compared to VGGNet and because it’s documented to provide a higher accuracy for benchmark datasets. essentially means that we re-use the feature extraction portion of the model that has been trained with the ImageNet dataset and re-train the classification portion on our fruit dataset.

Image for post
Image for post
See on transfer learning with an Inception V3 architecture

Here’s the code plus for our fine-tuned InceptionV3 model:

Once we begin training with model.fit(), we can use Comet to track how the model is performing in real-time. We can also check to make sure that we’re properly using our GPUs in the System Metrics tab. The experiment charts in Comet update with our model’s accuracy and loss metrics:

Image for post
Image for post

We’ll make sure to log our model weights at the end of the training process to Comet so we can reproduce the model in the future if we need to.

# save locally
model.save_weights('inceptionv3_tuned.h5')
# save to Comet Asset Tab
# you can retrieve these weights later via the REST API
experiment.log_asset(file_path='./inceptionv3_tuned.h5', file_name='inceptionv3_tuned.h5')

If you want to retrieve the model code and have trained your model from a git directory, simply use the Reproduce button in the Comet experiment view.

Image for post
Image for post

The Reproduce dropdown will surface key pieces of information about your environment, git commit, and everything you need to reproduce your experiment, including the actual run commands or notebook file. If you have uncommitted changes, we also provide you with a patch for applying your changes later.

Evaluating the model

In order to evaluate our image classifier model, it’s useful to generate a few sample predictions and plot a confusion matrix so we can see where our model classified certain fruits correctly and incorrectly.

These images and figures would also be useful to share with teammates, so we can log them to Comet even after the experiment is complete using the Experiment.log_figure() and Experiment.log_image() methods (see more ).

Image for post
Image for post
For , we’ve logged some random samples from our fruit dataset. You can see this sample image is hardly a very clear image of a strawberry (in fact, there was some preprocessing!).

See this great resource on evaluating machine learning models from Jeremy Jordan:

Further optimizations

There are several ways we could approach improving our model. Here is an non-exhaustive list of things we could try to adjust:

  • Type of architecture— we also provide the code for VGG16
  • Number of Layers — Increase network depth to give it more capacity. Try adding more layers, one at a time, and check for improvements
  • Number of Neurons in a layer
  • Adding regularization and adjusting those parameters
  • Learning Rate — you can incorporate the Keras LearningRateScheduler through the callback (see and )
  • Type of optimization / back-propagation technique to use
  • Dropout rate
  • H.

As you try these different optimizations, allows you to create visualizations like bar charts and line plots to track your experiments with along with parallel coordinate charts. These experiment-level and project-level visualizations help you quickly identify your best-performing models and understand your parameter space.


Your full machine learning pipeline

If you had to share your model results or intermediate work with your fellow data scientist today. How would you do it?

The benefits of using Quilt for data versioning and Comet for model versioning is that by combining these best-in-breed tools you can simultaneously make your machine learning model experiments easily accessible, trackable, and reproducible.

Sharing a model and the code used to generate it? Link your collaborator to . Sharing the data you used? Share a link to .

Image for post

Reproducing the result locally, or using an old experiment as the starting point for a new one? Get back to where you left off with this code:

# GET THE CODE 
git clone
cd open_fruit/
# GET THE DATA
python -c "import t4; t4.Package.install('quilt/open_fruit', 's3://quilt-example', dest='keras-fruit-classifier/')"
# GET THE ENVIRONMENT
# There are a *lot* of ways to do this: a pip requirements.txt, a
# conda environment.yml, a Docker container...
# Here's one cool way - cloning the Comet runtime 🙂
PY_VERSION=$(python -c "import comet_ml; print(comet_ml.API().get_experiment_system_details('01e427cedce145f8bc69f19ae9fb45bb')['python_version'])")
conda create -n my_test_env python=$PY_VERSION
conda activate my_test_env
python -c "import comet_ml; print('\n'.join(comet_ml.API().get_experiment_installed_packages('01e427cedce145f8bc69f19ae9fb45bb')))" > requirements.txt
pip install -r requirements.txt
# You can also get this from Comet.ml by clicking on the Download button# GET DEVELOPING
jupyter notebook

Congratulations! You’ve gone beyond building a multi-class image classifier model to building a fully reproducible (and shareable) machine learning pipeline with data, code, and environment details ⭐️

Comet.ml

Build better models faster

Thanks to Gideon Mendels and Aleksey Bilogur

Cecelia Shao

Written by

Product @ Bowery Farming

Comet.ml

Comet.ml

Comet provides a self-hosted and cloud-based meta machine learning platform allowing data scientists and teams to track, compare, explain and optimize experiments and models.

Cecelia Shao

Written by

Product @ Bowery Farming

Comet.ml

Comet.ml

Comet provides a self-hosted and cloud-based meta machine learning platform allowing data scientists and teams to track, compare, explain and optimize experiments and models.