Inside Microsoft’s New Frameworks to Enable Large-Scale AI
ZeRO-2, a new ONNX Runtime and a supercomputer built in partnership with OpenAI are Microsoft’s latest efforts to enable large scale AI workloads.
Training models with massive datasets is becoming the norm in modern deep learning applications. Some of the latest innovations such as Transformers and other pre-training models massively large training infrastructures. This type of pre-training models are often seen as the latest breakthrough in artificial intelligence(AI) and it was triggered a race across some of the top technology companies in the world to dominate that space. At its recent Build conference, Microsoft unveiled several key efforts to enable the construction and use of pre-training models at scale.
Pre-training models and Transformers have been at the center of the latest breakthroughs in deep learning. Google open the doors to this movement with its famous BERT model just to be quickly followed by other massively large architectures such as OpenAI’s GPT-2 or Microsoft’s Turin-NLG. The creation of this type of models requires more than just large computational resources as it often requires architectures optimized for this type of large training exercises. Given its novelty, those architectures still haven’t…