Catalyst — A PyTorch Framework for Accelerated Deep Learning R&D
Authors: Sergey Kolesnikov — Catalyst Team Lead.
Acknowledgments: Catalyst team and collaborators.
During the last decade, the Deep Learning progress led to various projects and frameworks. One of the most famous among researchers became the PyTorch one. Thanks to its pure pythonic way of executing and great low-level design, it gathered a lot of attention from the research community. Nevertheless, with great power comes great responsibility: due to such low-level functions, users are likely to introduce bugs during the research. Moreover, with the rise of hardware accelerators, it became crucial to have a simple API to operate with different hardware setups efficiently.
For the last three years, Catalyst-Team has been working on Catalyst — a high-level PyTorch framework for Deep Learning Research and Development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write yet another train loop. You get metrics, model checkpointing, advanced logging, and distributed training support without boilerplate code and low-level bugs.
“Write code with PyTorch, accelerate it with Catalyst!”
In this post, I would like to share our vision on high-level Deep Learning framework API and show current development progress on various examples.
Deep Learning recap
Before we start, let’s visualise typical Deep Learning SGD train loop:
You have your experiment with predefined stages, epochs, and data sources, which you iterate and feed the model with some data batches, running the SGD update. It looks very straightforward, but everything becomes complicated when the project grows and requires more deep learning tricks, like advanced metrics or hardware accelerators.
Runner
Starting from the beginning, Runner
is an abstraction that takes all the logic of your deep learning experiment: the data you are using, the model you are training, the batch handling logic, and everything about the used metrics and monitoring systems.
The Runner
has the most crucial role in connecting all other abstractions and defining the whole experiment logic into one place. Most importantly, it does not force you to use Catalyst-only primitives. It gives you a flexible way to determine the level of high-level API you want to get from the framework.
For example, you could:
- Define everything in a Catalyst-way with Runner and Callbacks:
ML — multiclass classification example. - Write forward-backward on your own, using Catalyst as a for-loop wrapper: CustomRunner — PyTorch for-loop decomposition.
- Mix these approaches: CV — MNIST GAN, CV — MNIST VAE examples.
Finally, the Runner
architecture does not depend on PyTorch, providing directions for adoption for Tensorflow2 or JAX.
Supported Runners are listed under the Runner API section.
Engine
The Engine
is the main force of the Runner
. It defines the logic of hardware communication and different deep learning techniques usage like distributed or mixed-precision training.
Thanks to the Engines’ design, it’s straightforward to adapt your pipeline for different hardware accelerators. For example, you could easily support PyTorch distribute setup, Nvidia-Apex setup, or AMP distributed setup. We are also working on other hardware accelerators support like DeepSpeed, Horovod, or TPU.
You can watch Engines development progress under the Engine API section.
Callback
The Callback
is an abstraction that helps you to customize the logic during your run. Once again, you could do anything natively with PyTorch and Catalyst as a for-loop wrapper. However, thanks to the callbacks, it's much easier to reuse typical deep learning extensions like metrics or augmentation tricks. For example, it's much more convenient to define the required metrics with them: ML - multiclass classification and ML – RecSys examples.
The Callback API repeats main for-loops in our train-loop abstraction:
You can find all supported callbacks under the Callback API section.
Metric
Speaking about the reusable deep learning components, the Catalyst also provides Metric
abstraction for convenient metric computation during an experiment run. Its API is quite simple:
You can find all supported metrics under the Metric API section.
Catalyst Metric API has a default update
and compute
methods to support per-batch statistic accumulation and final computation during training. All metrics also support update
and compute
key-value extensions for convenient usage during the run — it gives you the flexibility to store any number of metrics or aggregations you want with a simple communication protocol to use for their logging.
Logger
Finally, speaking about the logging, with the last Catalyst release, 21.xx, we have united the monitoring system API support into one abstraction:
With such a simple API, we already provide integrations for Tensorboard and MLFlow monitoring systems. More advanced loggers for Neptune and Wandb with artifacts and hyperparameters storing are in development thanks to joint collaborations between our teams.
All currently supported loggers can be found under the Logger API section.
Examples
Combining all abstractions together, it’s straightforward to write complex deep learning pipelines in a compact but user-friendly way.
PyTorch way — for-loop decomposition with Catalyst
Before Python API examples, I would like to mention that all Catalyst abstractions are fully compatible with native PyTorch and could be used as a simple for-loop wrapper to structure your code better.
Python API — user-friendly Deep Learning R&D
All the above examples help you write fully compatible PyTorch code without any external mixins. No custom modules or datasets required — everything works natively with PyTorch codebase, while Catalyst links it together in a more readable and reproducible way.
For more advanced examples, like GANs, VAE, or multistage runs (another unique feature of the Catalyst), please follow our minimal examples section.
The Catalyst Python API supports various user-friendly tricks, like overfit, fp16, ddp, and more, to make it easy for you to debug and speed up your R&D. To read more about all these features, please follow our .train
documentation. A minor example for your interest:
Config API — from research to production
Last but not least, Catalyst supports two advanced APIs for convenient production-friendly deep learning R&D. With Config API and Hydra API the Deep Learning R&D becomes fully reproducible thanks to YAML hyperparameters storage usage.
Config APIs examples can be found here. As you can see, the Config API fully repeats Runner specification in a YAML-based way, allowing you to change any part of your experiment without any code changes at all.
Thanks to such hyperparameters storage, it’s also very easy to run hyperparameters optimization with catalyst-dl tune
. You could find an example for catalyst-dl tune
under the Config API minimal example section. Once again, you could tune any part of your experiment with only a few lines change in your YAML file. That’s it, so simple.
During the last 3 years, we have done enormous work for accelerating Deep Learning RnD in a purely open-source ecosystem way thanks to our team and contributions. In this post, we have covered current framework design principles and a few minimal examples, so you could speed up your Deep Learning with Catalyst and make it fully reproducible.
If you are interested in Catalyst usage:
If you are interested in Catalyst development:
If you are motivated by our Catalyst open-source Deep Learning RnD ecosystem vision, you could support our initiative here or write directly to team@catalyst-team.com for collaboration.