PyTorch Lightning 1.3- Lightning CLI, PyTorch Profiler, Improved Early Stopping
PyTorch profiler integration, predict and validate trainer steps, and more
Today we are excited to announce Lightning 1.3, containing highly anticipated new features including a new Lightning CLI, improved TPU support, integrations such as PyTorch profiler, new early stopping strategies, predict and validate trainer routines, and more.
In addition, we are standardizing our release schedule. We will be launching a new minor release (1.X.0) every quarter, where we will build new features for 8–10 weeks, and then freeze new additions (except bug fixes) for 2 weeks prior to each minor release. Between these launches will continue to maintain weekly bug fixes releases, as we do now.
Overview of New PyTorch Lightning 1.3 Features
New Early Stopping Strategies
The EarlyStopping Callback in Lightning allows the Trainer to automatically stop when a given metric (e.g. the validation loss) stops improving. It is perfect for Hyper Parameter searches and Grid Runs because it limits the time spent on sets of parameters that lead to poor convergence or strong overfitting.
In this release, we added three new thresholds for early stopping:
stopping_threshold
: Stops training immediately once the monitored quantity reaches this threshold. It is useful when we know that going beyond a certain optimal value does not further benefit us.divergence_threshold
: Stops training as soon as the monitored quantity becomes worse than this threshold. When reaching a value this bad, we believe the model cannot recover anymore and it is better to stop early and run with different initial conditions.check_finite
: When turned on, we stop training if the monitored metric becomes NaN or infinite.
You should use these stopping criteria to save money when training on expensive resources and to accelerate hyperparameter search.
PyTorch 1.8.1 Profiler (Note: Requires PyTorch 1.8.1.)
We have integrated the new 1.8.1 PyTorch profiler! The PyTorch Profiler is an open-source tool that enables accurate and efficient performance analysis and troubleshooting for large-scale deep learning models. The profiled traces can be visualized directly inside chrome::tracing
or within TensorBoard with the PyTorch Profiler plugin. Just launch your training runs with the profiler flag set to pytorch
. For more details check out the New Profiler launch blog.
Improved TPU Support
Several improvements have been made for TPU training with Lightning in the current release. Currently, TPUs are available and tested on Google Cloud (GCP), Google Colab, and Kaggle Environments. Read more about TPUs in Lightning in this post from the Google Cloud blog.
Automatic Seeding of DataLoader Workers
In order to achieve deterministic, reproducible experiments, PyTorch users are advised to use the built-in torch.manual_seed
function.
However, what often gets forgotten is the fact that DataLoaders with multiple workers need to be initialized properly with a worker_init_fn
function as well. For many simple use cases it is not necessary, but when third-party libraries like numpy are involved (for e.g. randomized data augmentations), it is important to derive a new seed in the worker processes. This avoids duplicated random state which otherwise could yield duplicated samples being returned by the DataLoader.
This point was recently highlighted by /tanela on reddit as a bug that plagues thousands of open-source ML projects.
Lightning can now automatically take care of setting the correct seed in all DataLoader workers for you, even for multi-GPU/multi-node distributed training. To make your code deterministic and fully reproducible, you only need to add one line in your code:
Read more about reproducibility in Lightning here.
Lightning CLI [BETA]
A large source of boilerplate code that Lightning can help to reduce is in the implementation of command-line tools. For this reason, we have created the LightningCLI.
The LightningCLI
provides an interface to quickly parse input arguments, read configuration files and get to training your models as soon as possible. Furthermore, it provides a standardized way to configure training runs using a single file (.yaml) that includes settings for Trainer
, LightningModule
, and LightningDataModule
classes. This has the benefit of greatly simplifying the reproducibility of experiments.
The LightningCLI
relies on Python type hints and doc strings to automatically generate type checking and help messages for your code! No external annotations or code changes required. Just good Python practices.
LightningCLI is currently only supported for training.
Trainer routines .predict() [BETA] and .validate()
Adding trainer.predict()
and trainer.validate()
functions been a long-requested Lightning feature, and it is finally supported.
You can easily get your predictions from your model even when running in distributed settings.
trainer.predict()
relies onpredict_step
to return the predictionstrainer.validate()
works the same astrainer.predict()
but with your validation data and nopredict_step
requirement . You can use it to run a validation epoch before training starts or however you like!
Other Improvements:
- Added gradient_clip_algorithm argument to Trainer for gradient clipping by value.
- Added support for precision=64, enabling training with double precision.
- Added ignore parameter to the LightningModule.save_hyperparameter method.
Nightly Installs
We’ve received feedback from the community that you want to be able to try out the cutting-edge version of the repository. We appreciate our early adopters and are always happy to get constructive feedback or bug reports for any new feature before they are officially released. There are several ways to install the latest version:
- Installing from source
pip install https://github.com/PyTorchLightning/pytorch-lightning/archive/refs/heads/master.zip
- Install from our nightly builds
pip install --index-url https://test.pypi.org/simple/ pytorch-lightning
- Install the release candidate, available a few weeks before each minor release (these do not generally include features and are meant to fix bugs)
pip install --pre -U pytorch-lightning
Next Steps
If you enjoy lightning check out our other Ecosystem projects.
Torch Metrics
Flash
Transformers
Bolts
Thank you!
As always, we would like to shout out to our incredible community of contributors who never cease to amaze us. Join us!
Big kudos to all the community members for their contributions and feedback. We now have over 450+ Lightning contributors!
Want to give open source a try and get free Lightning swag? We have a #new_contributors channel on slack. Check it out!