Catalyst 102 — Core Trinity: Experiment, Runner, and Callback
If you haven’t known Catalyst, please check out our introduction post. Long story short, Catalyst is a PyTorch ecosystem framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and codebase reuse. This means that the user can seamlessly run a training loop with metrics, model checkpointing, advanced logging, and distributed training support without the boilerplate code.
In this tutorial, I would like to go through our first deep learning course homework and introduce your 3 Catalyst main abstractions — Experiment, Runner, and Callback. Sit back and let’s get started.
To recap, the overview
- Experiment — an abstraction that contains information about the experiment — a model, a criterion, an optimizer, a scheduler, and their hyperparameters. It also has information about the data and transformations used. The Experiment knows what you would like to run.
- Runner — an abstraction that knows how to run an experiment. It contains all the logic of how to work with your model per- stage, epoch, and batch.
- Callback — an abstraction that lets you customize your experiment run logic. To give users maximum flexibility and extensibility Catalyst supports callback execution anywhere in the training loop.
You could find full information about these abstractions in our docs. But for this tutorial, let’s dive into Kittylyst minimal example to make it clear and compare it to Catalyst's realization.
To begin with, let’s take a look at a typical train-loop and understand which parts we repeat there over and over. Can we stop writing boilerplate code over and over again and highlight some supporting abstractions? It seems so, and these abstractions lie in the Catalyst core design.
As I wrote in a previous post, each deep learning project has several main components. These primitives define what we want to use during the experiment:
- the data
- the model(s)
- the optimizer(s)
- the loss(es)
- and the scheduler(s) if we need them.
For each stage (another unique Catalyst feature, which we will discuss in more detail in the next tutorial) of our experiment, the Experiment provides interfaces to all primitives above + the callbacks (which we will also talk about later in this post).
After we’ve highlighted what we want to train/run, let’s figure out how we’ll do it. And Runner will help us with this.
From my experience, deep learning experiments follow the same for-loop (how to run)
- image classification — you need to feed your image to the networks, get logits back and compute the loss
- image segmentation — the same thing, but with mask logits ;)
- image detection — thanks to anchor-free detectors, the training process looks very similar to segmentation one nowadays
- gans— generate some data from noise with a generator, feed true and fake examples to a discriminator, compute the losses
- text classification — tokenize your text, output logits — typical classification once again
- text segmentation (NER) — I think you get it, feed text, output mask logits, compute the loss
- seq2seq — very similar to text segmentation approach — you need to predict the class for each token in the sequence
- recsys — encode your user features, output item-predictions logits, compute your rank metrics
As a result, the only thing that we want to change in these pipelines for new data/model is batch-handler. This is exactly what our Runner is doing — just look at the implementation in Kittylyst. It goes through stages and runs a common train-loop. You could find advanced implementation in Catalyst which brings complete reproducibility and has better for-loop decomposition.
For now, we have divided the train-loop into Experiment and Runner — what and how we would like to run. These two abstractions could be efficiently combined. For example, you can create SupervisedRunner for all Supervised-based deep learning tasks and use them with different experiments/models. Basically, for SupervisedRunner you need just override batch_handler, which will re-define how to work with your data-model for supervised tasks. You could find an example in Kittylyst, and more advanced Catalyst, which supports diverse data formats.
But what if we want to change something about the experiment run outside of the model feeding? For example, we want to calculate different metrics through input and output for our model or do gradient accumulation. We can write everything in
batch_handler, for example like here in Kittylyst - act 1 or here in Catalyst. But, once again, all these extra tricks are well standardized and reusable. For such extra-custom reusable components, we need our last abstraction - Callback. You could find an example of the tiny callback system in Kittylyst and Catalyst.
Such a callback system allows you to quickly enable/disable metrics and other dl tricks, like gradient accumulation, mixup, batch overfitting, early stopping — with only a few lines of code. During any experiment.
Putting this all together, we can create complex deep learning pipelines quickly and reproducibly without losing the low-level flexibility from PyTorch. Let’s look at the examples
In this post, we have discussed a typical train loop with PyTorch and introduce core abstractions of the Catalyst framework. If you want to learn more about deep learning with PyTorch / Catalyst, join our course and Slack.
In the later posts, we will continue to tell about the PyTorch Catalyst framework. We will inspect stages and why they are so important in deep learning. We will also discuss the pros/cons of our Config API. Both stages and Config API are unique features of the Catalyst framework and play an essential part in its design.
PS. Yup, as you have seen in the examples, Catalyst also has a bunch of user-friendly features for model selection, tracing, pruning, and quantization…. we also would discuss these features…. in future posts :)
PS2. For a convenient Catalyst Quickstart please follow this guide.