Latest Colossal-AI boasts novel automatic parallelism and offers savings up to 46x for Stable Diffusion 2

Published in

PyTorch

2 min readJan 31, 2023

This post is authored by Prof. Yang You, founder of HPC-AI Tech, the company developing Colossal-AI. Yang received his Ph.D. in Computer Science from UC Berkeley and is a Presidential Young Professor at the National University of Singapore.

As a new PyTorch Ecosystem Partner, we at HPC-AI Tech look forward to working with the PyTorch community to advance AI technologies through our open source project, Colossal-AI. We are excited to join forces with the PyTorch community in this effort.

With users worldwide such as AWS, Meta, BioMap, as well as over 8,000 Github stars and proven unmatched performance, Colossal-AI is a trusted resource for those looking to leverage the power of large-scale AI.

As with any good reciprocal partnership, we are eager to bring the latest Colossal-AI release version 0.2.0 to the table for PyTorch users. This release includes several key features to significantly minimize the cost and accelerate the time-to-market of implementing such large models, while enhancing ease of use for developers.

Automatic parallelism

Colossal-AI’s new one-line auto-parallelism system simplifies the process of deploying large-scale machine learning models for AI developers. Compared to other solutions that require manual configuration of complex parallel policies and model modification, Colossal-AI only requires one line of code from the user, along with cluster information and model configurations, to enable distributed training. It seamlessly integrates with popular AI model frameworks like Hugging Face and Timm.

# wrap the model using auto_engine
model, optimizer = auto_engine(model, optimizer, cluster_info)
# normal training loop
…

Stable Diffusion 2.0 optimization recipe

Out-of-the-box configuration for Stable Diffusion 2.0 in Colossal-AI enables low-cost training, fine-tuning, and inference, while also reducing GPU memory consumption by up to 5.6 times and hardware costs by up to 46 times. All of which can be achieved with just one line of code in PyTorch Lightning.

from lightning.pytorch import trainer, LightningModule
from lightning.pytorch.strategies import ColossalAIStrategy
Mystrategy = ColossalAIStrategy(use_chunk=True, enable_distributed_storage=True, placement_policy=auto)
trainer = Trainer(accelerator="gpu", devices=4, precision=16, strategy=Mystrategy)
trainer.fit(model)

More information

The new features are especially helpful for AIGC and ChatGPT-like applications. In fact, we are already working with a Fortune 500 company on a conversational robot solution utilizing a large model similar to ChatGPT enhanced by knowledge from web search results. The new Colossal-AI release also includes a BLOOM model recipe for stand-alone inference with a 4-fold reduction in GPU memory consumption and hardware costs reduced by over 10 times.

If you’d like to learn more about the latest release of Colossal-AI, check out our blog at https://www.hpc-ai.tech/blog/colossal-ai-0-2-0.

Latest Colossal-AI boasts novel automatic parallelism and offers savings up to 46x for Stable Diffusion 2

Automatic parallelism

Stable Diffusion 2.0 optimization recipe

More information

Written by Yang You