PyTorch Lightning 1.3- Lightning CLI, PyTorch Profiler, Improved Early Stopping

PyTorch profiler integration, predict and validate trainer steps, and more

PyTorch Lightning team
PyTorch
6 min readMay 7, 2021

--

Today we are excited to announce Lightning 1.3, containing highly anticipated new features including a new Lightning CLI, improved TPU support, integrations such as PyTorch profiler, new early stopping strategies, predict and validate trainer routines, and more.

In addition, we are standardizing our release schedule. We will be launching a new minor release (1.X.0) every quarter, where we will build new features for 8–10 weeks, and then freeze new additions (except bug fixes) for 2 weeks prior to each minor release. Between these launches will continue to maintain weekly bug fixes releases, as we do now.

Overview of New PyTorch Lightning 1.3 Features

New Early Stopping Strategies

Early Termination Point [1]

The EarlyStopping Callback in Lightning allows the Trainer to automatically stop when a given metric (e.g. the validation loss) stops improving. It is perfect for Hyper Parameter searches and Grid Runs because it limits the time spent on sets of parameters that lead to poor convergence or strong overfitting.

Here is how you can use these new thresholds with just two extra lines of code:

In this release, we added three new thresholds for early stopping:

  • stopping_threshold: Stops training immediately once the monitored quantity reaches this threshold. It is useful when we know that going beyond a certain optimal value does not further benefit us.
  • divergence_threshold: Stops training as soon as the monitored quantity becomes worse than this threshold. When reaching a value this bad, we believe the model cannot recover anymore and it is better to stop early and run with different initial conditions.
  • check_finite: When turned on, we stop training if the monitored metric becomes NaN or infinite.

You should use these stopping criteria to save money when training on expensive resources and to accelerate hyperparameter search.

PyTorch 1.8.1 Profiler (Note: Requires PyTorch 1.8.1.)

We have integrated the new 1.8.1 PyTorch profiler! The PyTorch Profiler is an open-source tool that enables accurate and efficient performance analysis and troubleshooting for large-scale deep learning models. The profiled traces can be visualized directly inside chrome::tracing or within TensorBoard with the PyTorch Profiler plugin. Just launch your training runs with the profiler flag set to pytorch. For more details check out the New Profiler launch blog.

Profiler Chrome Tracing

Improved TPU Support

Several improvements have been made for TPU training with Lightning in the current release. Currently, TPUs are available and tested on Google Cloud (GCP), Google Colab, and Kaggle Environments. Read more about TPUs in Lightning in this post from the Google Cloud blog.

Automatic Seeding of DataLoader Workers

In order to achieve deterministic, reproducible experiments, PyTorch users are advised to use the built-in torch.manual_seed function.

However, what often gets forgotten is the fact that DataLoaders with multiple workers need to be initialized properly with a worker_init_fn function as well. For many simple use cases it is not necessary, but when third-party libraries like numpy are involved (for e.g. randomized data augmentations), it is important to derive a new seed in the worker processes. This avoids duplicated random state which otherwise could yield duplicated samples being returned by the DataLoader.

This point was recently highlighted by /tanela on reddit as a bug that plagues thousands of open-source ML projects.

Lightning can now automatically take care of setting the correct seed in all DataLoader workers for you, even for multi-GPU/multi-node distributed training. To make your code deterministic and fully reproducible, you only need to add one line in your code:

Read more about reproducibility in Lightning here.

Lightning CLI [BETA]

A large source of boilerplate code that Lightning can help to reduce is in the implementation of command-line tools. For this reason, we have created the LightningCLI.

Comparison of using our argparse integration vs the LightninCLI

The LightningCLI provides an interface to quickly parse input arguments, read configuration files and get to training your models as soon as possible. Furthermore, it provides a standardized way to configure training runs using a single file (.yaml) that includes settings for Trainer, LightningModule, and LightningDataModule classes. This has the benefit of greatly simplifying the reproducibility of experiments.

train.py file

The LightningCLI relies on Python type hints and doc strings to automatically generate type checking and help messages for your code! No external annotations or code changes required. Just good Python practices.

LightningCLI is currently only supported for training.

Using Lightning CLI

Trainer routines .predict() [BETA] and .validate()

Adding trainer.predict() and trainer.validate()functions been a long-requested Lightning feature, and it is finally supported.

You can easily get your predictions from your model even when running in distributed settings.

  • trainer.predict() relies on predict_step to return the predictions
  • trainer.validate() works the same as trainer.predict()but with your validation data and no predict_step requirement . You can use it to run a validation epoch before training starts or however you like!

Other Improvements:

Nightly Installs

We’ve received feedback from the community that you want to be able to try out the cutting-edge version of the repository. We appreciate our early adopters and are always happy to get constructive feedback or bug reports for any new feature before they are officially released. There are several ways to install the latest version:

  • Installing from source

pip install https://github.com/PyTorchLightning/pytorch-lightning/archive/refs/heads/master.zip

  • Install from our nightly builds

pip install --index-url https://test.pypi.org/simple/ pytorch-lightning

  • Install the release candidate, available a few weeks before each minor release (these do not generally include features and are meant to fix bugs)

pip install --pre -U pytorch-lightning

Next Steps

If you enjoy lightning check out our other Ecosystem projects.

Torch Metrics

Flash

Transformers

Bolts

Thank you!

As always, we would like to shout out to our incredible community of contributors who never cease to amaze us. Join us!

Big kudos to all the community members for their contributions and feedback. We now have over 450+ Lightning contributors!

Want to give open source a try and get free Lightning swag? We have a #new_contributors channel on slack. Check it out!

--

--

PyTorch Lightning team
PyTorch

We are the core contributors team developing PyTorch Lightning — the deep learning research framework to run complex models without the boilerplate