Published in


Catalyst dev blog - 20.07 release

Falcon 9 SpaceX launch

Hi, I am Sergey, the author of the Catalyst — PyTorch library for deep learning research and development. In our previous blog posts, we covered an introduction to the Catalyst and our advanced pipeline for NLP on BERT distillation. In this post, I would like to share with you our development progress for the last month. Let’s check what features we have added to the framework in such a short time.


  • Training Flow improvements: BatchOverfitCallback, PeriodicLoaderCallback, ControlFlowCallback
  • Metric Learning features: InBatchSampler, AllTripletsSampler, HardTripletsSampler, tutorial
  • Fixes and acknowledgments
  • New integrations: MONAI & Catalyst
  • Ecosystem update — Alchemy

You can find all examples from this blog post on this Google Colab,

Training Flow improvements


For better user experience with deep learning, you need to think not only about cool engineering features like distributed support, half-precision training, and metrics (we already have them). You also have to think about common difficulties that occur during experimentations.

Imagine a typical research situation: you wrote your fancy pipeline, got the dataset, and try to fit this data into your model. But something goes wrong and you can’t get desired results.

One of the potential causes — there is a problem with pipeline convergence. You could subsample your data and check that model easily overfits only on this subset. But do it again and again along all your projects? Looks like we need a general solution for this problem. And here comes our BatchOverfitCallback (contributed by Scitator). The idea behind it is straightforward— let’s take only a requested number of batches from your dataset and use only them for training.

So, let’s check some deep learning pipeline,

Catalyst pipeline setup

You can run it with

Catalyst experiment run

Thanks to the update, you can check your pipeline convergence with only one extra line

Run with `overfit` flag

This way you can easily debug your experiment without extra code refactoring. You could also redefine the required number of batches per loader.

Run with `BatchOverfitCallback`

What is even cooler, we have integrated this feature into our Config API. You can use it with

catalyst-dl run --config=/path/to/config --overfit


During your research practices, you could find yourself in the situation, when you have only a few train samples and a huge test set to check your model performance. Alternatively, you could have computational heavy validation (for example, during the NMS stage on anchor-box object detection) that takes too much time of your training pipeline. You can increase the train set for each epoch with BalanceClassSampler, but what if you want to keep your training data unchanged? Try our new PeriodicLoaderCallback (contributed by Ditwoo).

For the example above you can set a validation run every 2 epochs:

Run with `PeriodicLoaderCallback`

Thanks to Catalyst design, we could extend it for any number of your data sources:

Run with `PeriodicLoaderCallback` and multiple loaders


After PeriodicLoaderCallback we asked ourselves: “If you can enable/disable data sources, why can’t you do the same with metrics and entire Callbacks?”. For example, you have a metric you don’t want to compute during the training or validation stage. With ControlFlowCallback (contributed by Ditwoo) it could be done easily:

Catalyst pipeline with ControlFlowCallback example

Now you can define with which loaders and epochs you would like to use Callback, or ignore it.

Metric Learning features

I also want to make a preview of extra updates during this release. For the last month we were working hard developing a foundation for Metric Learning research. We have prepared several InBatchTripletsSamplers (contributed by AlekseySh) — helper modules for online triplets mining during training,

We hope these abstractions would help in your research. We are working on Metric Learning minimal example now to create a starting benchmark for this case. Stay in touch for the upcoming tutorial.


Last but not least, as with every release, this one was with a few fixes,

Integrations — MONAI segmentation example

In collaboration with the MONAI team, we have prepared an introduction tutorial on 3D image segmentation with the MONAI and Catalyst framework.


We still have a lot of plans:

  • TPU support — with current cpu, gpu, and Slurm support, we want to push the frontiers and get Catalyst to the fancy TPU
  • kornia integration — we already have a native integration with the famous albumentations library, but… why should not we make a fair comparison between alternatives and take the best for our customers? Stay in touch for an upcoming benchmark on image augmentation libraries benchmark by Catalyst-Team
  • model auto-pruning — as far as Catalyst is a framework for deep learning research and development, and we already support model auto-tracing, we want introduce framework support for models auto-pruning.

Ecosystem release — Alchemy

During this Catalyst release, we also have another great new — we are moving our ecosystem powered monitoring tools to the global MVP release. Feel free to use it and share your feedback with us.

We help researchers to accelerate pipilines with Catalyst and to find insights with Alchemy along the whole R&D process: these ecosystem tools are available for you to train, share and collaborate more effectively.

Photograph by Robert Ormerod


Our goal is to build a foundation for fundamental breakthroughs in deep learning and reinforcement learning areas. Nevertheless, it is really hard to build an Open Source Ecosystem with only a few motivated people. If you are a company that is deeply committed to using open source technologies in deep learning, and want to support our initiative, feel free to write us at For details about Ecosystem, check our vision and manifesto.




An open source machine learning framework that accelerates the path from research prototyping to production deployment

Recommended from Medium

A Coursera equivalent to a typical post-graduate diploma in ML & AI

Semantic Segmentation to Solve The Lyft Perception Challenge

Machine Learning: Diving Deeper

An illustration(GIF) to explain deep convolutional networks (DCNN)

Privacy attacks on Machine Learning

Fundamentals of TensorFlow (Low Level)

ML Primer

Standardization Doesn’t always help for features: depends on the label itself

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sergey Kolesnikov

Sergey Kolesnikov

More from Medium

Conversion of NER based Pytorch model to ONNX

Review — Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation…

How to Start Using Natural Language Processing With PyTorch

Natural Language Processing with PyTorch

Active learning made simple using Flash and BaaL