Sparsity and Sustainability: Estimating the Carbon Footprint of ThirdAI’s UDT Framework

ThirdAI’s Universal Deep Transformers AutoML Interface, powered by our proprietary BOLT deep learning framework, provides substantial reductions in energy consumption compared to popular pre-trained NLP models without reducing model quality

Published in

ThirdAI Blog

4 min readJan 17, 2023

**This post was written by Benjamin Meisburger, former intern at ThirdAI**

As sustainability becomes an increasingly critical requirement for organizations across all business sectors, reducing the cost and energy consumption of training and deploying large-scale AI models has emerged as an essential task. In the case of GPT-3, for instance, the electricity and compute cost of training alone was reported to be $12 million. This concern has only intensified in recent months as model sizes continue to balloon.

In this post, we study how ThirdAI’s BOLT engine, the framework behind our Universal Deep Transformers AutoML library, translates into carbon savings. In short, we find that BOLT’s ability to train sparse neural networks on everyday CPU hardware yields significant energy savings, producing only 2.5% of the carbon emissions associated with fine-tuning a RoBERTa model on a sentiment analysis task.

Inspiration

After reading Etsy’s Cloud Jewels blog post, I was curious to see if we could replicate their methodology to ballpark the carbon footprint required to train a model with BOLT at different levels of sparsity. My goals were twofold: (1) to determine if our BOLT engine provides a meaningful reduction in carbon footprint when compared to a state-of-the-art equivalent, and second, to determine what extent sparsity impacts BOLT’s net carbon footprint.

Methodology

For my tests, I chose a straightforward and practical task: sentiment classification on Yelp reviews. For this benchmark, there is a well-defined state-of-the-art model, RoBERTa, fine-tuned for sentiment.

To standardize our testing as much as possible, all experiments were run on AWS instances in the us-west-1 availability zone, providing a replicable and consistent framework. First, we fine-tuned an off-the-shelf pre-trained RoBERTa model on a p4d.24xlarge instance, which reached ~93% test accuracy in 40 minutes on a single A100 GPU (a p4d.24xlarge instance includes eight A100s). We used 93% accuracy as our threshold for all subsequent models. Next, we trained BOLT on an r6g.xlarge CPU instance from scratch with 20% sparsity (meaning the neural network used only 20% of the full dense matrix computations) that reached 93% accuracy in 42 minutes. We repeated this experiment with 10% sparsity, resulting in a training time of just over 20 minutes. Finally, we trained BOLT with 5% sparsity; however, this particular model only reached 90% accuracy, yielding a more pronounced tradeoff between model quality and energy consumption.

To estimate the carbon footprint of a given AWS instance, we used the formula below. We obtained estimates for power consumption and manufacturing carbon footprint estimates from this dataset, which aggregates information via the methodology described in this blog post.

For example, when estimating the emissions from fine-tuning RoBERTa on a single A100 within a p4d.24xlarge instance:

This process was repeated for each level of sparsity, compiled in the figure below. All BOLT networks were trained from scratch on a single r6g.xlarge instance while RoBERTa was fine-tuned on a single A100 GPU.

Analysis

As shown above, the BOLT engine provides a significant improvement over a state-of-the-art NLP model in regards to carbon footprint; on average producing just 2.5% the carbon emissions of RoBERTa fine-tuned on a GPU. We also note that this comparison does not take into account the pre-training cost of RoBERTa, which is also substantial. For BOLT, it should be noted that increasing sparsity yields a significant — but not quite linear — carbon savings, although the emissions savings begin to plateau as the total carbon footprint approaches the (unavoidable) cost of manufacturing a CPU. It should be noted that these figures are estimates only, but we are confident they are conservative enough to capture procedural error and thus should be representative of the true footprint.

Use UDT and BOLT to Drive Sustainability and Lower Costs in Your Organization

Climate change and environmental sustainability are perhaps the defining challenge of our time. If you are concerned by the rising carbon footprint and exorbitant financial costs of training and deploying state-of-the-art AI models, we have an answer. By rejecting the conventional wisdom that large-scale neural networks require specialized, power-hungry hardware and dense computations, we have built a hands-off-the-wheel AutoML product, called UDT, to perform deep learning on ordinary CPU hardware with comparable quality to the existing state-of-the-art techniques. As we have seen in work with our customers, UDT can substantially reduce training costs, accelerate real-time prediction latencies, and even improve model accuracy.

To try out UDT for your business needs, please reach out to us by requesting a trial license for our software. We also invite you to explore our Google Colab demo notebooks that showcase UDT in action.