PyTorch Lightning 1.3- Lightning CLI, PyTorch Profiler, Improved Early Stopping

PyTorch profiler integration, predict and validate trainer steps, and more

Published in

PyTorch

6 min readMay 7, 2021

Today we are excited to announce Lightning 1.3, containing highly anticipated new features including a new Lightning CLI, improved TPU support, integrations such as PyTorch profiler, new early stopping strategies, predict and validate trainer routines, and more.

PyTorchLightning/pytorch-lightning

Lightning disentangles PyTorch code to decouple the science from the engineering. Lightning structures PyTorch code…

github.com

In addition, we are standardizing our release schedule. We will be launching a new minor release (1.X.0) every quarter, where we will build new features for 8–10 weeks, and then freeze new additions (except bug fixes) for 2 weeks prior to each minor release. Between these launches will continue to maintain weekly bug fixes releases, as we do now.

Overview of New PyTorch Lightning 1.3 Features

New Early Stopping Strategies

The EarlyStopping Callback in Lightning allows the Trainer to automatically stop when a given metric (e.g. the validation loss) stops improving. It is perfect for Hyper Parameter searches and Grid Runs because it limits the time spent on sets of parameters that lead to poor convergence or strong overfitting.

Here is how you can use these new thresholds with just two extra lines of code:

In this release, we added three new thresholds for early stopping:

stopping_threshold: Stops training immediately once the monitored quantity reaches this threshold. It is useful when we know that going beyond a certain optimal value does not further benefit us.
divergence_threshold: Stops training as soon as the monitored quantity becomes worse than this threshold. When reaching a value this bad, we believe the model cannot recover anymore and it is better to stop early and run with different initial conditions.
check_finite: When turned on, we stop training if the monitored metric becomes NaN or infinite.

You should use these stopping criteria to save money when training on expensive resources and to accelerate hyperparameter search.

PyTorch 1.8.1 Profiler (Note: Requires PyTorch 1.8.1.)

We have integrated the new 1.8.1 PyTorch profiler! The PyTorch Profiler is an open-source tool that enables accurate and efficient performance analysis and troubleshooting for large-scale deep learning models. The profiled traces can be visualized directly inside chrome::tracing or within TensorBoard with the PyTorch Profiler plugin. Just launch your training runs with the profiler flag set to pytorch. For more details check out the New Profiler launch blog.

Improved TPU Support

Several improvements have been made for TPU training with Lightning in the current release. Currently, TPUs are available and tested on Google Cloud (GCP), Google Colab, and Kaggle Environments. Read more about TPUs in Lightning in this post from the Google Cloud blog.

Train ML models with Pytorch Lightning on TPUs | Google Cloud Blog

How to start training ML models with Pytorch Lightning on TPUs.

cloud.google.com

Automatic Seeding of DataLoader Workers

In order to achieve deterministic, reproducible experiments, PyTorch users are advised to use the built-in torch.manual_seed function.

However, what often gets forgotten is the fact that DataLoaders with multiple workers need to be initialized properly with a worker_init_fn function as well. For many simple use cases it is not necessary, but when third-party libraries like numpy are involved (for e.g. randomized data augmentations), it is important to derive a new seed in the worker processes. This avoids duplicated random state which otherwise could yield duplicated samples being returned by the DataLoader.

This point was recently highlighted by /tanela on reddit as a bug that plagues thousands of open-source ML projects.

Lightning can now automatically take care of setting the correct seed in all DataLoader workers for you, even for multi-GPU/multi-node distributed training. To make your code deterministic and fully reproducible, you only need to add one line in your code:

Read more about reproducibility in Lightning here.

Lightning CLI [BETA]

A large source of boilerplate code that Lightning can help to reduce is in the implementation of command-line tools. For this reason, we have created the LightningCLI.

Comparison of using our argparse integration vs the LightninCLI

The LightningCLI provides an interface to quickly parse input arguments, read configuration files and get to training your models as soon as possible. Furthermore, it provides a standardized way to configure training runs using a single file (.yaml) that includes settings for Trainer, LightningModule, and LightningDataModule classes. This has the benefit of greatly simplifying the reproducibility of experiments.

The LightningCLI relies on Python type hints and doc strings to automatically generate type checking and help messages for your code! No external annotations or code changes required. Just good Python practices.

LightningCLI is currently only supported for training.

Trainer routines .predict() [BETA] and .validate()

Adding trainer.predict() and trainer.validate()functions been a long-requested Lightning feature, and it is finally supported.

You can easily get your predictions from your model even when running in distributed settings.

trainer.predict() relies on predict_step to return the predictions
trainer.validate() works the same as trainer.predict()but with your validation data and no predict_step requirement . You can use it to run a validation epoch before training starts or however you like!

Other Improvements:

Added gradient_clip_algorithm argument to Trainer for gradient clipping by value.
Added support for precision=64, enabling training with double precision.
Added ignore parameter to the LightningModule.save_hyperparameter method.

Nightly Installs

We’ve received feedback from the community that you want to be able to try out the cutting-edge version of the repository. We appreciate our early adopters and are always happy to get constructive feedback or bug reports for any new feature before they are officially released. There are several ways to install the latest version:

Installing from source

pip install https://github.com/PyTorchLightning/pytorch-lightning/archive/refs/heads/master.zip

Install from our nightly builds

pip install --index-url https://test.pypi.org/simple/ pytorch-lightning

Install the release candidate, available a few weeks before each minor release (these do not generally include features and are meant to fix bugs)

pip install --pre -U pytorch-lightning

Next Steps

If you enjoy lightning check out our other Ecosystem projects.

Torch Metrics

PyTorchLightning/metrics

Simple installation from PyPI Other installations Install using conda conda install torchmetrics Pip from source pip…

github.com

Flash

PyTorchLightning/lightning-flash

Read our launch blogpost Pip / conda pip install lightning-flash -U Other installations Pip from source pip install…

github.com

Transformers

PyTorchLightning/lightning-transformers

Option 1: from PyPI pip install lightning-transformers # instead of: `python train.py ...`, run with…

github.com

Bolts

PyTorchLightning/lightning-bolts

CI testing System / PyTorch ver. 1.6 (min. req.) 1.8 (latest) Linux py3.{6,8} OSX py3.{6,8} Windows py3.7* * testing…

github.com

Thank you!

As always, we would like to shout out to our incredible community of contributors who never cease to amaze us. Join us!

Big kudos to all the community members for their contributions and feedback. We now have over 450+ Lightning contributors!

Want to give open source a try and get free Lightning swag? We have a #new_contributors channel on slack. Check it out!

PyTorch Lightning 1.3- Lightning CLI, PyTorch Profiler, Improved Early Stopping

PyTorch profiler integration, predict and validate trainer steps, and more

PyTorchLightning/pytorch-lightning

Lightning disentangles PyTorch code to decouple the science from the engineering. Lightning structures PyTorch code…

Overview of New PyTorch Lightning 1.3 Features

New Early Stopping Strategies

PyTorch 1.8.1 Profiler (Note: Requires PyTorch 1.8.1.)

Improved TPU Support

Train ML models with Pytorch Lightning on TPUs | Google Cloud Blog

How to start training ML models with Pytorch Lightning on TPUs.

Automatic Seeding of DataLoader Workers

Lightning CLI [BETA]

Trainer routines .predict() [BETA] and .validate()

Other Improvements:

Nightly Installs

Next Steps

Torch Metrics

PyTorchLightning/metrics

Simple installation from PyPI Other installations Install using conda conda install torchmetrics Pip from source pip…

Flash

PyTorchLightning/lightning-flash

Read our launch blogpost Pip / conda pip install lightning-flash -U Other installations Pip from source pip install…

Transformers

PyTorchLightning/lightning-transformers

Option 1: from PyPI pip install lightning-transformers # instead of: `python train.py ...`, run with…

Bolts

PyTorchLightning/lightning-bolts

CI testing System / PyTorch ver. 1.6 (min. req.) 1.8 (latest) Linux py3.{6,8} OSX py3.{6,8} Windows py3.7* * testing…

Thank you!

Written by PyTorch Lightning team