Inside Microsoft’s New Frameworks to Enable Large-Scale AI

ZeRO-2, a new ONNX Runtime and a supercomputer built in partnership with OpenAI are Microsoft’s latest efforts to enable large scale AI workloads.

Jesus Rodriguez
The Startup

--

Credits: https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/

Training models with massive datasets is becoming the norm in modern deep learning applications. Some of the latest innovations such as Transformers and other pre-training models massively large training infrastructures. This type of pre-training models are often seen as the latest breakthrough in artificial intelligence(AI) and it was triggered a race across some of the top technology companies in the world to dominate that space. At its recent Build conference, Microsoft unveiled several key efforts to enable the construction and use of pre-training models at scale.

Pre-training models and Transformers have been at the center of the latest breakthroughs in deep learning. Google open the doors to this movement with its famous BERT model just to be quickly followed by other massively large architectures such as OpenAI’s GPT-2 or Microsoft’s Turin-NLG. The creation of this type of models requires more than just large computational resources as it often requires architectures optimized for this type of large training exercises. Given its novelty, those architectures still haven’t…

--

--

Jesus Rodriguez
The Startup

CEO of IntoTheBlock, President of Faktory, President of NeuralFabric and founder of The Sequence , Lecturer at Columbia University, Wharton, Angel Investor...