The Techlife
Published in

The Techlife

Microsoft ZeRO-Offload: Democratizing Billion-Scale Model Training

Microsoft’s competitive model for training huge models (compared to Google Switch Transformers and OpenAI GPT-3)

Photo by Joshua Fuller on Unsplash

It can train models with over 13 billion parameters on a single GPU, a 10x increase in size compared to popular framework such as PyTorch, and it does so without

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store