Faster training and inference using the Azure Container for PyTorch in Azure ML

Beatriz Stollnitz
PyTorch
Published in
4 min readDec 9, 2022

--

Photo by Marc-Olivier Jodoin on Unsplash

If you’ve ever wished that you could speed up the training of a large PyTorch model, then this post is for you! The Azure ML team has recently released the public preview of a new curated environment that enables PyTorch users to optimize training and inference for large models. In this post, I’ll cover the basics of this new environment, and I’ll show you how you can use it within your Azure ML project.

Azure Container for PyTorch (ACPT)

The new curated environment, called the Azure Container for PyTorch (ACPT), consists of a Docker image containing the latest compatible versions of Ubuntu, CUDA, Python, and PyTorch, as well as various state-of-the-art technologies that optimize training and inference of large models. Among other technologies, it uses the ONNX Runtime to represent the machine learning models, DeepSpeed to improve large scale training, and FairScale to optimize distributed training.

Benefits of the ACPT

If you’re working with a large PyTorch model, you’ll experience significantly faster training and inference when using the ACPT. The graph below compares the time it takes to train several HuggingFace PyTorch models, using three different methods: PyTorch on its own (white), PyTorch…

--

--

Beatriz Stollnitz
PyTorch

Bea Stollnitz is a principal developer advocate at Microsoft, focusing on Azure OpenAI, Azure ML, and other AI/ML technologies.