Faster training and inference using the Azure Container for PyTorch in Azure ML

Published in

PyTorch

4 min readDec 9, 2022

Photo by Marc-Olivier Jodoin on Unsplash

If you’ve ever wished that you could speed up the training of a large PyTorch model, then this post is for you! The Azure ML team has recently released the public preview of a new curated environment that enables PyTorch users to optimize training and inference for large models. In this post, I’ll cover the basics of this new environment, and I’ll show you how you can use it within your Azure ML project.

Azure Container for PyTorch (ACPT)

The new curated environment, called the Azure Container for PyTorch (ACPT), consists of a Docker image containing the latest compatible versions of Ubuntu, CUDA, Python, and PyTorch, as well as various state-of-the-art technologies that optimize training and inference of large models. Among other technologies, it uses the ONNX Runtime to represent the machine learning models, DeepSpeed to improve large scale training, and FairScale to optimize distributed training.

Benefits of the ACPT

If you’re working with a large PyTorch model, you’ll experience significantly faster training and inference when using the ACPT. The graph below compares the time it takes to train several HuggingFace PyTorch models, using three different methods: PyTorch on its own (white), PyTorch…

Faster training and inference using the Azure Container for PyTorch in Azure ML

Azure Container for PyTorch (ACPT)

Benefits of the ACPT

Written by Beatriz Stollnitz