Why it’s important to manage ML experiments

5 min readJan 16, 2023

When we develop some ML project we can not be sure that one method will work better than another one. Even though most papers provide benchmarks and a paper is about your task that is not guaranteed to reproduce same metrics on your data and domain.

That’s why all we can do is just try some options as follows:

model architecture (YOLO, SSD, ResNet and so on) and its options like number of layers, dropout, feature dim;
optimizer (SGD, Adam, Adagrad, Adadelta, etc);
dataset version
hyperparameters (learning rate, loss weights)

So our experiment has following steps: choose options from list above, run train and then test.

The issue of this article is how to store all of these experiments, compare them and reproduce. We need ML experiment manager. You can find more info about some of them here. I will tell you about VertaAI.

Verta | Model Management & Operations for MLOps and ModelOps

Model Management & Operations for high-velocity Data Science and Machine Learning teams. Verta is a Gartner® Cool…

www.verta.ai

In addition to experiment managment there are tools for deploy, data versioning and monitoring. But we will focus only on managment under this article.

VertaAI manager allows developer to store hyperparameters, metrics, plots, data version, code version, tables and others. Note that some of them are avaiable only for interprise.

VertaAI uses the following hierarchy:

Note that it’s good way to store all projects in one service (http://localhost:3555).

Manage experiment

For better understanding let’s manage experiment step by step. At first, let’s create out first project “Car Detection”. It’s able to do it either in UI or in code.

from verta import Client

# pconnect to VertaAI server
client = Client("http://localhost:3555")

# get project if it exists or create new one
proj = client.set_project("Car Detection")

Project has a list of experiments. It’s kind of abstract thing because researcher can associate experiment with new NN model, train method, augmentation or another idea. For example, we found new unlabelled dataset and predicted labels using another model. And after that we want to train our model using this data. Let’s call our new experiment “Pseudo labelling” (and add tag “data”).

# get experiment if it exists or create new one
expt = client.set_experiment("Pseudo labelling")

Each experiment can have multiple runs (train or test). A run refers to different hyperparameters, dataset versions, code versions etc. So if you vary some of them you should to create new experiment run. I use following template:

“Train ID-{i}” for i-th train run under current experiment
“Test ID-{j}” for j-th test run under current experiment

RUN_TRAIN_NAME = "TrainID-{:03d}"
run_id = 1
while True:
    try:
        client.get_experiment_run(RUN_TRAIN_NAME.format(run_id))
    except ValueError:
        # experiment run with given name doesn't exist
        break
    run_id += 1

# create new experiment
description = "It's example exp run"
tags = ["example", "debug"]
expt_run = client.set_experiment_run(RUN_TRAIN_NAME.format(run_id),
                                     desc=description, 
                                     tags=tags)

Experiment content

Now let’s overview of what this experiment run can have.

Name, description, owner, datetime, tags

Hyperparameters

hyparams = {
    'train/opt': 'Nesterov',
    'train/lr': 0.01,
    # ...
}
expt_run.log_hyperparameters(hyparams)

attributes

meta_params = {
    'train_manifest': 'path/to/train_manifest.json',
    'test_manifest': 'path/to/test_manifest.json',
    'batch_size': 600,
    'random_seed': 20,
    # ...
}
for param, value in meta_params.items():
    expt_run.log_attribute(param, value, overwrite=True)

artifacts (files for downloading)

artifacts = {
    'config': 'path/to/config.yaml'
}
for name, content in artifacts.items():
    expt_run.log_artifact(name, content)

step plots (metrics per epoch)

metrics = {
    'Recall': 0.914,
    'FP': 0.813,
    # ...
}
epoch = 8
for metric_name, value in metrics.items():
    expt_run.log_observation(metric_name, value, epoch_num=epoch)

code version

commit_hash='192f2094...fe'
repo_url = "https://tfs.int.nt-com.ru/ASCollection/"\
                   "ObjDetection/_git/ObjDetection"
expt_run.log_code(
    repo_url=repo_url,
    commit_hash=commit_hash,
    autocapture=False,
)

metrics. Summary metrics after train epochs.

sum_metrics = {
    'TP': 20289,
    'Miss': 1383,
    'FA': 1193,
    # ...
}
for metric_name, value in sum_metrics.items():
    expt_run.log_metric(metric_name, value, overwrite=True)

It’s important that using only this page we have whole information about experiment run. It means that we are able to reproduce it.

After creating multiple experiment runs we will get something like this.

Filtering and comparing

It’s able to filter experiment runs by name, tag and datetime.

If you choose several experiment runs, VertaAI allows you to compare their parameters and plots within one page.

Conclusion

The motivation idea of using ML experiment manager is the possibility to compare results and reproduce them. Thus if we get good metrics during experiment, we may need to reproduce experiment with additional data and achieve better metrics.

There are many other alternatives to VertaAI, so feel free to try them and share your feedback :)