PyTorch Lightning 0.9 — synced BatchNorm, DataModules and final API!
Newest PyTorch Lightning release includes the final API with better data decoupling, shorter logging syntax and tons of bug fixes
We’re happy to release PyTorch Lightning 0.9.0 today, which contains many great new features, more bug fixes than any release we ever had, but most importantly it introduced our mostly final API changes!
Lightning is being adopted by top researchers and AI labs around the world, and we are working hard to make sure we provide a smooth experience and support for all the latest best practices.
In this release, we are introducing two new major (and last) API changes:
Lightning is all about making your code more readable and structured.
We decouple the model architecture from engineering, and we continue to do the same with data. To make sharing and reusing data splits and transforms across projects, we created LightningDataModules.
A LightningDataModule is a shareable, reusable class that encapsulates all the steps needed for training:
- Download / tokenize / process.
- Clean and save to disk for reuse.
- Load inside Dataset in memory or just-in-time.
- Apply transforms (rotate, tokenize, etc…).
- Wrap inside a DataLoader.
LightningDataModules can be shared and used anywhere:
In this video Nate Raw, DL research engineer at PyTorch Lightning, walks you step by step:
You can check out the docs on the new DataModules here.
We added to Lightning two new results objects: TrainResult and EvalResult. They are fancy dictionary objects to hold outputs from train/eval/test steps. They are meant to control where and when to log and how synchronization is done across accelerators:
Use TrainResult to auto log from training_step:
The ‘train_loss’ we added to TrainResult will generate automatic tensorboard logs (you can also use any of the other loggers we support):
TrainResult default is to log every step of training.
Use EvalResult to auto log from validation_step or test_step:
EvalResult default is to log every epoch end.
Sync across devices
When training on multiple GPUs/CPUs/TPU cores, you can calculate the global mean of a logged metric as follows:
result.log('train_loss', loss, sync_dist=True)
For more logging options, check out our docs.
A few other highlights of 0.9 include:
- PyTorch 1.6 support
- Added saving test predictions on multiple GPUs
- Added support to export a model to ONNX format
- More sklearn metrics, SSIM, BLEU
- Added SyncBN for DDP
- Support for remote directories via gfile
Read the full release notes here.
We also upgraded our docs with some videos that illustrate core Lightning features in seconds! Check them out, and let us know what you’d like to see next!
We want to thank all our devoted contributors for their hard work, and to the community for all your help. We definitely wouldn’t get here without you. Try it out, share your projects on our #slack, and stay tuned for our next release- 1.0!