Catalyst — A PyTorch Framework for Accelerated Deep Learning R&D

Catalyst Team
Published in
7 min readMay 11, 2021


Authors: Sergey KolesnikovCatalyst Team Lead.
Acknowledgments: Catalyst team and collaborators.

During the last decade, the Deep Learning progress led to various projects and frameworks. One of the most famous among researchers became the PyTorch one. Thanks to its pure pythonic way of executing and great low-level design, it gathered a lot of attention from the research community. Nevertheless, with great power comes great responsibility: due to such low-level functions, users are likely to introduce bugs during the research. Moreover, with the rise of hardware accelerators, it became crucial to have a simple API to operate with different hardware setups efficiently.

For the last three years, Catalyst-Team has been working on Catalyst — a high-level PyTorch framework for Deep Learning Research and Development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write yet another train loop. You get metrics, model checkpointing, advanced logging, and distributed training support without boilerplate code and low-level bugs.

“Write code with PyTorch, accelerate it with Catalyst!”

In this post, I would like to share our vision on high-level Deep Learning framework API and show current development progress on various examples.

Deep Learning recap

Before we start, let’s visualise typical Deep Learning SGD train loop:

You have your experiment with predefined stages, epochs, and data sources, which you iterate and feed the model with some data batches, running the SGD update. It looks very straightforward, but everything becomes complicated when the project grows and requires more deep learning tricks, like advanced metrics or hardware accelerators.


To solve all the challenges above, we created Catalyst — a PyTorch framework for Deep Learning R&D focused on rapid experimentation, reproducibility, and codebase reuse. It comprises a few helpful abstractions:


Starting from the beginning, Runner is an abstraction that takes all the logic of your deep learning experiment: the data you are using, the model you are training, the batch handling logic, and everything about the used metrics and monitoring systems.

Runner abstract code

The Runner has the most crucial role in connecting all other abstractions and defining the whole experiment logic into one place. Most importantly, it does not force you to use Catalyst-only primitives. It gives you a flexible way to determine the level of high-level API you want to get from the framework.

For example, you could:

Finally, the Runner architecture does not depend on PyTorch, providing directions for adoption for Tensorflow2 or JAX.
Supported Runners are listed under the Runner API section.


The Engine is the main force of the Runner. It defines the logic of hardware communication and different deep learning techniques usage like distributed or mixed-precision training.

Engine abstract code

Thanks to the Engines’ design, it’s straightforward to adapt your pipeline for different hardware accelerators. For example, you could easily support PyTorch distribute setup, Nvidia-Apex setup, or AMP distributed setup. We are also working on other hardware accelerators support like DeepSpeed, Horovod, or TPU.
You can watch Engines development progress under the Engine API section.


The Callback is an abstraction that helps you to customize the logic during your run. Once again, you could do anything natively with PyTorch and Catalyst as a for-loop wrapper. However, thanks to the callbacks, it's much easier to reuse typical deep learning extensions like metrics or augmentation tricks. For example, it's much more convenient to define the required metrics with them: ML - multiclass classification and ML – RecSys examples.

The Callback API repeats main for-loops in our train-loop abstraction:

Callback abstract code

You can find all supported callbacks under the Callback API section.


Speaking about the reusable deep learning components, the Catalyst also provides Metric abstraction for convenient metric computation during an experiment run. Its API is quite simple:

Metric abstraction code

You can find all supported metrics under the Metric API section.

Catalyst Metric API has a default update and compute methods to support per-batch statistic accumulation and final computation during training. All metrics also support update and compute key-value extensions for convenient usage during the run — it gives you the flexibility to store any number of metrics or aggregations you want with a simple communication protocol to use for their logging.


Finally, speaking about the logging, with the last Catalyst release, 21.xx, we have united the monitoring system API support into one abstraction:

Logger abstract code

With such a simple API, we already provide integrations for Tensorboard and MLFlow monitoring systems. More advanced loggers for Neptune and Wandb with artifacts and hyperparameters storing are in development thanks to joint collaborations between our teams.
All currently supported loggers can be found under the Logger API section.


Combining all abstractions together, it’s straightforward to write complex deep learning pipelines in a compact but user-friendly way.

PyTorch way — for-loop decomposition with Catalyst

Before Python API examples, I would like to mention that all Catalyst abstractions are fully compatible with native PyTorch and could be used as a simple for-loop wrapper to structure your code better.

CustomRunner — PyTorch for-loop decomposition

Python API — user-friendly Deep Learning R&D

Linear Regression
Hyperparameters optimization with Optuna

All the above examples help you write fully compatible PyTorch code without any external mixins. No custom modules or datasets required — everything works natively with PyTorch codebase, while Catalyst links it together in a more readable and reproducible way.

For more advanced examples, like GANs, VAE, or multistage runs (another unique feature of the Catalyst), please follow our minimal examples section.

The Catalyst Python API supports various user-friendly tricks, like overfit, fp16, ddp, and more, to make it easy for you to debug and speed up your R&D. To read more about all these features, please follow our .train documentation. A minor example for your interest:

full-featured MNIST example in only 60 lines of code

Config API — from research to production

Last but not least, Catalyst supports two advanced APIs for convenient production-friendly deep learning R&D. With Config API and Hydra API the Deep Learning R&D becomes fully reproducible thanks to YAML hyperparameters storage usage.

Config API

Config APIs examples can be found here. As you can see, the Config API fully repeats Runner specification in a YAML-based way, allowing you to change any part of your experiment without any code changes at all.

Thanks to such hyperparameters storage, it’s also very easy to run hyperparameters optimization with catalyst-dl tune. You could find an example for catalyst-dl tune under the Config API minimal example section. Once again, you could tune any part of your experiment with only a few lines change in your YAML file. That’s it, so simple.

During the last 3 years, we have done enormous work for accelerating Deep Learning RnD in a purely open-source ecosystem way thanks to our team and contributions. In this post, we have covered current framework design principles and a few minimal examples, so you could speed up your Deep Learning with Catalyst and make it fully reproducible.

If you are interested in Catalyst usage:

If you are interested in Catalyst development:

If you are motivated by our Catalyst open-source Deep Learning RnD ecosystem vision, you could support our initiative here or write directly to for collaboration.