The Green Algorithm: Measuring Sustainability in AI

Published in

Version 1

7 min readOct 9, 2023

Created using Microsoft Bing Image Creator

In the realm of AI, assessing carbon emissions using CO2-equivalents (CO2eq) during model training while considering regional variations is pivotal for sustainability. The significant variability in energy efficiency among AI algorithms underscores the need to incorporate energy metrics alongside conventional performance measures. Notably, large language models like ChatGPT and Llama 2 exhibit substantial carbon footprints and energy consumption, prompting a critical examination of sustainability within their context. Policy recommendations underscore the importance of selecting eco-conscious cloud providers and strategic data centre locations. The overarching goal is to encourage the development of energy-efficient and environmentally responsible AI solutions, with a strong emphasis on standardized reporting of energy and carbon data to promote social responsibility within AI research and development.

Sustainability metrics

Let’s investigate the some of the metrics: CO2-equivalents, electricity consumption, Carburacy and Universal Sustainability Metrics. You’ll find a summary table at the end for reference.

CO2-equivalents

The environmental impact of Machine Learning (ML) model training can be measured and addressed through key metrics. CO2eq provide a standardized measure to quantify carbon emissions associated with ML model training, simplifying the assessment of greenhouse gas (GHG) impact. It’s important to note significant regional differences in carbon emissions, with North American servers emitting anywhere from 0 tCO2eq/kWh (20g) in Quebec, Canada, to 0.0008 tCO2eq/kWh in Iowa, USA. These variations are due to differences in energy sources and grids. While AI currently contributes a relatively small 1.4% to global GHG emissions within Information and Communication Technologies, concerns arise about potential rapid increases if current AI research trends continue.

There is a focus on sustainability metrics for evaluating the environmental impact of Automated Machine Learning (AutoML) research. Key metrics include runtime, which relates to energy consumption and can be used to estimate environmental impact, especially when combined with data on energy consumption and the energy mix. CPU/GPU (Central/Graphical Processing Units) hours are suggested as a quantifiable measure that can be converted into CO2eq emissions with knowledge of the energy mix. Energy consumption, while dependent on hardware, is considered a direct measure of the environmental footprint. It is important to consider CO2eq as a direct measure of environmental impact, even if it’s challenging to measure directly, as it can be computed based on energy consumption and energy mix data.

Figure 2. Carbon emission reduction options

In addition, sustainability metrics are crucial when assessing the environmental impact of large language models (LLMs) like ChatGPT and Llama 2. This involves considering the carbon footprint associated with LLMs at various stages, including training, inference, and real-time requests, considering factors like energy source, training location, and GPU energy consumption. For example, training GPT-3 with 175 billion parameters resulted in 608 tCO2eq emissions. Llama 2’s developers disclosed its carbon footprint, which was estimated at 539 tCO2eq due to 3.3 million GPU hours of computation. Importantly, Meta’s sustainability program has offset 100% of Llama 2’s emissions, making the model carbon neutral.

The intersection of AI and the climate emergency presents valuable insights. It delves into the carbon footprint of AI training processes, highlighting the exponential increase in GHG emissions tied to ML model development due to rising computing demands. A striking estimate for a single training run of the GPT-3 model, amounting to 247 tCO2eq, serves as a sobering comparison to emissions from typical passenger cars in the United States.

Electricity consumption

The FLOPS/W (Floating Point Operations per Second per Watt) metric is crucial for evaluating the energy efficiency of hardware devices used in ML. It shows substantial differences, with CPUs being significantly up to ten times less efficient, compared to GPUs, while TPUs (Tensor Processing Units) are impressively 4 to 8 times more efficient than GPUs. These metrics provide a solid basis for assessing and reducing carbon emissions in the field of ML, promoting environmentally responsible practices.

An independent study reveals significant differences in energy efficiency among AI algorithms. K-Nearest Neighbour (KNN) is the most energy-efficient, using only 0.01 Joules on average, while Random Forest is the least energy-efficient, with an average consumption of about 1.98 Joules per run. The energy consumption disparities range widely, from a 20% reduction (Bagging Classifier vs. Support-Vector Machine) to a substantial 99.49% reduction (Random Forest vs. KNN). These findings stress the importance of considering energy efficiency as a critical factor when choosing ML models, in addition to traditional performance metrics.

Figure 2. Energy efficiency among AI algorithms

Another study highlights the importance of evaluating AI technologies with a dual focus on accuracy and energy consumption. While accuracy metrics are well-established, measuring energy usage is more challenging. The study introduces various metrics to assess the environmental impact of AI, including model size, number of parameters, elapsed time, FLOPs, and FLOPs per watt (efficiency ratio). It stresses the need to understand the trade-offs between accuracy and energy, especially in AI training, where larger datasets and complex models can yield diminishing returns in accuracy improvement.

Llama 2 discloses its power consumption during training, with GPU usage ranging from 350 to 400W. While this power usage is significant due to the model’s size, efforts have been made to enhance efficiency. Meta has also initiated a sustainability program to offset the carbon emissions from the model’s training, showcasing their commitment to reducing the environmental impact of AI. They have further shared their research and a Responsible Use Guide, advocating for ethical and responsible model usage. It’s crucial to recognize that AI models like Llama 2 can have a substantial environmental impact due to their computational demands for training. Nevertheless, these measures to mitigate emissions, provide transparency on energy usage, and promote responsible use are steps toward more sustainable AI development.

Carburacy

Carburacy is introduced as a novel carbon-aware accuracy measure for evaluating the eco-sustainability of Natural Language Processing (NLP) models. It combines two essential factors: ‘effectiveness’ (as ‘R’), which measures model accuracy using ROUGE (package for automatic summaries evaluation) scores, and ‘cost’ (as ‘C’), encompassing resource requirements like energy consumption, hardware costs, and carbon emissions. Efficiency amalgamates effectiveness and cost. Carburacy aims to strike a balance between achieving high accuracy and minimising environmental impact.

It explores the impact of input size on sustainability, revealing larger inputs can improve effectiveness and increase energy consumption, affecting Carburacy scores. Training batch size is another consideration, with larger batches reducing backpropagation calls but potentially hindering Carburacy by decreasing effectiveness. Decoding strategy choice, such as beam search, presents a trade-off between effectiveness and energy consumption. Few-shot learning is highlighted as an eco-sustainable approach, achieving good effectiveness with minimal environmental impact. Model type also matters, with linear models proving more eco-sustainable in long document summarization tasks compared to quadratic models, achieving higher Carburacy scores with larger input lengths.

Universal Sustainability Metrics

In the context of edge computing and AI, the emphasis is on developing “Green AI” solutions by introducing crucial sustainability metrics. These metrics include Recognition Efficiency (RE), which balances accuracy, complexity, and energy consumption during inference, and Training Efficiency (TE), which prioritises energy efficiency during AI model training. Deep Learning Lifecycle Efficiency (DLLCE) is introduced to assess model efficiency throughout their lifecycle, considering factors beyond inference energy. Life Cycle Recognition Efficiency considers efficiency across the complete model lifecycle, including re-training and deployment phases. A sustainability classification system (Class A, B, C, D) based on RE values categorises models, libraries, and platforms into different sustainability classes with specific threshold ranges.

There are valuable insights into reducing the environmental impact of ML model training and addressing carbon emissions. It highlights significant energy consumption, particularly when using fossil fuels, and recommends quantifying emissions with CO2eq while considering server location. Efficiency improvements in infrastructure and training practices, such as optimising GPUs and hyperparameter search, are suggested. Advocacy for sustainable cloud providers and data centre locations is stressed. The importance of minimising resource waste and selecting energy-efficient hardware is stressed, while acknowledging practical constraints.

Table 1. Summary of sustainability metrics

Conclusion

This overview shows the need for standardization in measuring and reporting power, energy, and carbon consumption data in AI, emphasizing the lack of comprehensive sustainability metrics in academic and industry domains. It advocates for a holistic approach that integrates human, social, and environmental values in AI solutions. It introduces sustainability metrics to encourage the development of energy-efficient AI systems, offering frameworks to evaluate and compare efficiency and sustainability across platforms and use cases. It particularly emphasises the importance of considering sustainability metrics in models like ChatGPT and Llama 2, using estimates and comparisons to illustrate the significance of this issue. The call for systematic reporting of carbon emissions and model details by researchers and organizations is made to promote sustainability and social responsibility in AI, encouraging further community engagement on this vital topic.

This article was written with the help of ChatGPT and Bing Chat.

About the author:

Rosemary J Thomas, PhD, is a Senior Technical Researcher at the Version 1 AI Labs.