Catalyst dev blog - 20.07 release

Sergey Kolesnikov
Jul 16, 2020 · 5 min read
Image for post
Image for post
Falcon 9 SpaceX launch

Hi, I am Sergey, the author of the Catalyst — PyTorch library for deep learning research and development. In our previous blog posts, we covered an introduction to the Catalyst and our advanced pipeline for NLP on BERT distillation. In this post, I would like to share with you our development progress for the last month. Let’s check what features we have added to the framework in such a short time.

tl;dr

You can find all examples from this blog post on this Google Colab,

Training Flow improvements

BatchOverfitCallback

For better user experience with deep learning, you need to think not only about cool engineering features like distributed support, half-precision training, and metrics (we already have them). You also have to think about common difficulties that occur during experimentations.

Imagine a typical research situation: you wrote your fancy pipeline, got the dataset, and try to fit this data into your model. But something goes wrong and you can’t get desired results.

One of the potential causes — there is a problem with pipeline convergence. You could subsample your data and check that model easily overfits only on this subset. But do it again and again along all your projects? Looks like we need a general solution for this problem. And here comes our BatchOverfitCallback (contributed by Scitator). The idea behind it is straightforward— let’s take only a requested number of batches from your dataset and use only them for training.

So, let’s check some deep learning pipeline,

Catalyst pipeline setup

You can run it with

Catalyst experiment run

Thanks to the update, you can check your pipeline convergence with only one extra line

Run with `overfit` flag

This way you can easily debug your experiment without extra code refactoring. You could also redefine the required number of batches per loader.

Run with `BatchOverfitCallback`

What is even cooler, we have integrated this feature into our Config API. You can use it with

catalyst-dl run --config=/path/to/config --overfit

PeriodicLoaderCallback

During your research practices, you could find yourself in the situation, when you have only a few train samples and a huge test set to check your model performance. Alternatively, you could have computational heavy validation (for example, during the NMS stage on anchor-box object detection) that takes too much time of your training pipeline. You can increase the train set for each epoch with BalanceClassSampler, but what if you want to keep your training data unchanged? Try our new PeriodicLoaderCallback (contributed by Ditwoo).

For the example above you can set a validation run every 2 epochs:

Run with `PeriodicLoaderCallback`

Thanks to Catalyst design, we could extend it for any number of your data sources:

Run with `PeriodicLoaderCallback` and multiple loaders

ControlFlowCallback

After PeriodicLoaderCallback we asked ourselves: “If you can enable/disable data sources, why can’t you do the same with metrics and entire Callbacks?”. For example, you have a metric you don’t want to compute during the training or validation stage. With ControlFlowCallback (contributed by Ditwoo) it could be done easily:

Catalyst pipeline with ControlFlowCallback example

Now you can define with which loaders and epochs you would like to use Callback, or ignore it.

Metric Learning features

I also want to make a preview of extra updates during this release. For the last month we were working hard developing a foundation for Metric Learning research. We have prepared several InBatchTripletsSamplers (contributed by AlekseySh) — helper modules for online triplets mining during training,

We hope these abstractions would help in your research. We are working on Metric Learning minimal example now to create a starting benchmark for this case. Stay in touch for the upcoming tutorial.

Fixes

Last but not least, as with every release, this one was with a few fixes,

Integrations — MONAI segmentation example

In collaboration with the MONAI team, we have prepared an introduction tutorial on 3D image segmentation with the MONAI and Catalyst framework.

Plans

We still have a lot of plans:

Ecosystem release — Alchemy

During this Catalyst release, we also have another great new — we are moving our ecosystem powered monitoring tools to the global MVP release. Feel free to use it and share your feedback with us.

We help researchers to accelerate pipilines with Catalyst and to find insights with Alchemy along the whole R&D process: these ecosystem tools are available for you to train, share and collaborate more effectively.

Image for post
Image for post
Photograph by Robert Ormerod

Afterword

Our goal is to build a foundation for fundamental breakthroughs in deep learning and reinforcement learning areas. Nevertheless, it is really hard to build an Open Source Ecosystem with only a few motivated people. If you are a company that is deeply committed to using open source technologies in deep learning, and want to support our initiative, feel free to write us at catalyst.team.core@gmail.com. For details about Ecosystem, check our vision and manifesto.

PyTorch

An open source machine learning framework that accelerates…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store