Jupyter Notebook: beyond prototyping with NBTools and Research

Published in

Data Analysis Center

6 min readSep 5, 2023

Jupyter Notebook is a very handy and ubiquitous ML prototyping tool. However, at some point you have to perform reproducible multiple experiments and transfer the code to scripts and lose interactivity. To work with the same notebook at all stages, from baseline to production, our team has developed several tools:

NBTools: collection of instruments for running Jupyter Notebooks and interacting with them,
Research: a tool for multiple parallel experiments.

These tools make possible to:

create reproducible and well-described baseline solution in Jupyter Notebook,
use it to produce hundreds and thousands of relatively similar experiments in parallel with different parameters,
make the final Jupyter Notebook and use it in production solution.

Baseline

Imagine that you received a new task from a customer: for example, to identify the breed of a dog from a photograph. So, in the course of discussions, you find more details about the problem and get a labeled dataset. Data acquisition is a separate issue that we shall leave out of the equation and, for example, you can read about it here. Most likely, you have to iteratively improve the dataset along with the solution.

Surely you will create a new Jupyter Notebook and write the code in it using your favorite libraries and frameworks. Already at this stage, it is important to have a well-described solution for further work. There are several reasons for this:

it makes it easier for your team members to collaborate,
it will be easier to return to code after a while,
a notebook is immediately a report on the solution.

We try to maintain a single standard for all notebooks:

Description of the dataset (number of elements, their statistics, etc.) and data loading code,
Model description (model type, hyperparameters) and model build code,
Training process with derived statistics: loss function values, available resources, their utilization,
Model validation: examples of inference, metrics,
(optional) Further model improvements and ideas.

You can find an example of a notebook designed according to the described standard at Google Colab. We made it with our BatchFlow library, which has PyTorch wrappers that use configs to describe neural network architectures. This possibility is extremely useful for studying the influence of various parameters on the result.

In the beginning, you can see necessary imports and the nbtools.set_gpus function, which allows you to select a free available GPU for training the model. While it is easy enough to do yourself, it makes managing multiple kernels and experiments that much easier.

Then the description of the dataset.

If you need to preprocess the entire dataset and analyze it, it is reasonable to do it in a separate notebook, and add a link to it here. Only now we are writing code for lazy loading of the dataset and show a few examples for clarity.

If a code block in the notebook is complex, move it to a separate class in a different module (e.g., loading and processing data from a specific format). Then the notebook will remain structured and easy to read. This is what we did when solving the problem of detecting cancerous nodules on a CT scan. We moved the designed loaders and handlers to a separate RadIO library.

Then we describe model architecture in a model config. Model training in cell 7 is called not just with the tqdm progress bar, but with batchflow.Notifier. It allows you to track resource usage, which suggests potential optimizations (for example, GPU memory is not utilized to the fullest, so we can increase batch size)

You can do the same if you are not using BatchFlow directly to train the model. It is enough to train the model in batchflow.Monitor as a context manager.

Then we provide the code to validate the model and compute metrics.

Linter for notebook

If you write the code in a separate script, it can be passed through pylint to avoid stylistic and other errors, but it does not work with notebooks. To do this, we have created a function nbtools.pylint_notebook that you can run directly from a notebook. As a result, you will have a full review of the code in cells.

Run notebook with arguments

Well, there is a baseline solution: everything works, the data is loaded, the model is trained and even produces some kind of reasonable result. Although the model is one of the state-of-the-art architectures (isn’t it?), but its hyperparameters, most likely, are not chosen in the best way. And maybe we need to perform some data processing procedures (such as normalization) differently. How to try different parameters and evaluate the result?

There may be several options:

add some kind of grid search to an existing notebook,
transform all the code to the script and save all artifacts (statistics, metrics and data examples) to disk or to the experiment tracker

Both options are reasonable, but we need to rewrite the existing solution sufficiently, with a chance of breaking something when rewriting the code. We propose another option.

Note that we defined some parameters as constants at the beginning of the notebook. We can define certain parameters separately, like batch size and number of epochs, but we can also set other parameters that we want to test. Ah, if it were possible to make arguments for starting a notebook…

nbtools.run_notebook does exactly that: it substitutes variables in the desired cell, executes and saves its top-to-bottom executed version in the specified path. In the provided example, we define channels sequence, attention and loss as constants. So now we will substitute different values for the variables in the cell 2.

Code to run notebook with parameters from config

In addition, nbtools.run_notebook can return values from any notebook (for example, metric values). In this approach, the baseline notebook requires minimal intervention, and after choosing the most suitable configuration, the notebook that produces the best model is ready to use.

Multiple experiments

Now we can launch notebooks with arguments, it remains only to do it in parallel and batchflow.research helps us here. We’ve already told about it that article. All combinations of parameters under study can be conveniently described using Domain class and each call of nbtools.run_notebook will take one of the parameters combinations from it. Parallel execution of experiments on different GPUs will be taken over by Research. See the full example here.

GPU utilization is conveniently monitored using nbwatch from the console. There, we can immediately see the memory consumed by various processes and the overall resource utilization.

As a result, we will get a lot of completed notebooks: you can save them, or you can only get metric values from there. In addition, within the notebook itself, you can implement an experiment tracker, and additionally, for convenience, transfer the ID of the experiment to the notebook (research generates them).

Conclusion

This paper discusses the use of Jupyter Notebooks in software development and machine learning projects. We present our instruments nbtools and Research. They help to make workflow, which involves creating a reproducible baseline solution, conducting parallel experiments, and using the final notebook in a production solution. This approach allows us to reduce our labor costs for rewriting the baseline code into code for experiments and further rewriting in production