How green is your machine learning?

Published in

Barnacle Labs

9 min readJun 25, 2023

Photo by Ash from Modern Afflatus on Unsplash

The IPCC, the UN body charged with assessing the science related to climate change, has concluded that “Human activities, principally through emissions of greenhouse gases, have unequivocally caused global warming”, a conclusion supported by 97% of publishing climate scientists. Even the small number of dissenters acknowledge that this consensus is creating a high degree of pressure for change. It’s perhaps not surprising that people are asking questions about the energy and CO2 impact of machine learning (ML).

Is the ML trend taking us down an unsustainable climate path?

I know from my many conversations with enterprise customers that this is a key concern — almost everyone wants to do the right thing.

One thing is for sure, the headlines are numerous:

But how big is the problem, what are the trends and how might we do better?

Estimating ML energy impact

According to the International Energy Agency (IEA), global data centre electricity use, excluding crypto, in 2021 was 220–320 TWh2. Or, put another way, around 0.9–1.3% of global final electricity demand. Given its relative newness, ML workloads will be a small subset of that usage.

Further, since 2010, data centre energy use (excluding crypto) has grown only moderately despite the strong growth in demand for data centre services. This is in part due to efficiency improvements in IT hardware and cooling, together with a shift away from small, inefficient enterprise data centres towards more efficient cloud and hyperscale data centres.

These figures, however, exclude energy used for cryptocurrency mining, which alone was 100–140 TWh in 2021. Yes, you got that right, crypto takes nearly half of the total energy used by all other data centre activity. That’s what happens when a tech trend gets out of hand.

Google, a tech giant with one the the larger ML usage demands, reports that ML energy usage has held steady at <15% of the company’s total energy use. Given the special nature of Google, it’s likely the aggregate proportion, including more conservative firms, is considerably less than 15%.

It’s difficult to estimate the total global ML energy usage, but it’s fair to assume it’s a relatively small proportion of total data centre demand and considerably less than that imposed by cryptocurrency mining. Nevertheless, AI is the big thing right now, so perhaps it will increase — so we would do well to think about it hard.

The pressure for change

The standard workhorse for AI model training and inferencing is the Nvidia A100, a GPU that costs in the order of $1,000/month to rent in the cloud. An A100 uses 300W of power.

The H100 is the next generation after the A100, which is only just beginning to be used. It uses 350W of power, but has 2–3x the performance of an A100 — so actually represents a roughly halving of energy needed for the same workload.

However, the A100 and H100 are expensive and many multiples of them are required to train an advanced model, making this an expensive game. These real world financial facts are encouraging ML researchers to explore how resource usage requirements can be significantly reduced.

LoRA reduces energy demands

There’s currently a lot of excitement about Low-Rank Adaptation of Large Language Models (LoRA), a technique that achieves dramatic reductions in memory, resource and energy utilisation.

“QLORA reduces the average memory requirements of finetuning a 65B parameter model from >780GB of GPU memory to <48GB without degrading the runtime or predictive performance… Using QLORA, we train the Guanaco family of models, with… 97.8% of the performance level of ChatGPT… while being trainable in less than 12 hours on a single consumer GPU.”

This greater memory-efficiency makes it possible to run ML fine-tuning of smaller models on consumer GPUs like the Tesla T4, which are free and readily accessible in Kaggle or Google Colab notebooks. As Bob Dylan would say, the times they are a-changin’ — anyone can finetune a model now, even if it’s just a small one.

It seems likely that other optimisations will be discovered. In fact, a research paper published by Google reported that adoption of best practices can reduce energy by up to 100x and carbon emissions by up to 1000x.

Industry consensus is that training and inference costs have hit a high water mark and the innovation focus of researchers is currently very much upon reducing resource demands.

Openness drives efficiency

Openness inspires competitiveness between researchers. Nearly all ML papers quote benchmarks to demonstrate the effectiveness of a particular piece of ML research. Mostly this tends to focus on performance — inching models towards human levels of capability.

In some limited cases we’re beginning to see statistics about compute and energy usage show up in that research. This is a very good thing — the very existence of a metric in one paper encourages other researchers to report better figures in their papers.

Making this data open also empowers customers to make greener choices — if you know that model-a is more efficient than model-b, you’re more likely to choose model-a. This further encourages the development of efficiency gains, as providers compete with each other and customers vote with their feet by choosing more efficient solutions.

However, this data is currently reported only on a very patchy basis. Most research still fails to disclose energy statistics and when it does, it’s often done in ways that are too vague or which measure different things and so make comparisons challenging. Further, the energy usage of commercial entities such as Google and OpenAI is very opaque. As a result, assessing and comparing the green impact of ML choices is currently extremely tough.

Measuring energy use

As ever with ML, a search of arXiv turns up some interesting and highly relevent material.

How to estimate carbon footprint when training deep learning models? A guide and review

However, it’s immediately clear that the field is highly complex, with a wide variety of different tools and techniques. There’s clearly a need for some level of standardisation, otherwise everyone will just report different statistics, gathered in different ways — making comparisons impossible.

Nevertheless, the existence of detailed information about measuring ML energy use is a positive development. That this paper was published just this month is an indicator of the growing interest in the topic.

Regulation?

There’s been a lot of talk about regulation in AI recently. Speaking personally, I am in favour of regulating for more openness — enforcing the reporting of energy statistics would be very helpful.

We may also need a standardised testing suite and ways to measure energy usage. Ultimately, there might need to be a standards body that performs or oversees such tests. After all, we’ve had things like the CE mark and FCC compliance in the physical world for a very long time. It may be that something is needed in the software world.

The importance of CPU/GPU utilisation

I know from too many years in enterprise IT that a big driver of energy efficiency in the data centre is server utilisation. Servers burn energy whether you’re using them or not, so a machine that’s highly utilised can be orders of magnitude more energy efficient than one that’s sat idling. It remains the case that operating compute infrastructure at high levels of utilisation requires a large degree of skill, skill that most corporate IT departments do not posses. Trust me, I’ve been there ;-)

I could rent myself a dedicated A100 in Hugging Face, load it with the most efficient model imaginable and feel pretty smug with myself — after all, I’d now be operating a highly efficient ML model. Unfortunately, whether that is actually efficient or not depends on how far I can stretch its utilisation.

If my A100 is just ticking along and not doing much for most of the day, I’d have to apportion the energy costs of running that A100 constantly across a small number of transactions. In other words, despite my efficient model, it would actually represent a very inefficient solution.

An over-focus on benchmark comparisons between models, to the exclusion of real-world infrastructure utilisation, can lead to erroneous conclusions. It’s likely that specialists like Google, Microsoft and OpenAI are driving their GPU clusters at very high levels — certainly better than I’m likely to achieve on my dedicated Hugging Face A100. Unless I have a lot of transactions running through my A100, one of the big commercial models might be a greener solution — I’ll have to put my smugness to one side 😔

Summary

We are still in the early stages of the evolution of ML and can expect significant change in how these systems are constructed as experience grows.

Google has estimated the potential for 100–1000x reduction in energy usage by adopting best ML practices. Software optimisations, such as LoRA, are already proving the potential to make large efficiency gains and we can expect this trend to continue. It’s reasonable to assume that ML energy costs for both training and inferencing are likely to reduce, not increase.

However, it’s not only software design efficiency that’s important — operational skills that can run large GPU clusters at scale, with high utilisation levels, will have a significant impact on overall energy intensiveness of a solution. We should be careful about only comparing benchmark data and not taking into account real world infrastructure utilisation metrics when calculating the energy intensiveness of a solution.

Customers can only make informed choices and exert influence if they are provided with the data to support those choices. Requiring the publication of open and accurate compute and energy data by model creators and operators would help to empower those customers to make greener choices, together with encouraging and accelerating a competitive spirit of ML efficiency gains.

There is much work to do in greening ML workloads, but many reasons to be optimistic about the ability to dramatically increase efficiency. Today, it’s a complex and messy landscape of data with many missing pieces of the jigsaw. Nevertheless, attempts to make sense of what evidence there is and ask questions of suppliers can only send a signal that this is important and encourage improvements in both data openness and model efficiency 🌿🤖👍🏻

Postscript

After writing this I realised I’d missed out on possibly one of the most important factors — the source of energy used to power our GPUs. If that source is coal, it’s bad, but if it’s hydro, wind or solar, it’s carbon neutral. The source makes a massive difference to the carbon footprint of any solution. Hence, the location of the data centre within which our solution is deployed might be the most important influence of our CO2 emissions.

What we know is that the global tech companies — Amazon, Google, Meta, Apple, Microsoft — have put a lot of effort into sourcing green electricity to power their data centres.

According to the International Energy Agency (IEA):

Apple (2.8 TWh), Google (18.3 TWh), Meta (9.4 TWh) and Microsoft (13 TWh) purchased or generated enough renewable electricity to match 100% of their operational electricity consumption in 2021 (primarily in data centres). Amazon consumed 30.9 TWh (85% renewable) across their operations in 2021, with a goal of achieving 100% renewables by 2025.
However, matching 100% of annual demand with renewable energy purchases or certificates does not mean that data centres and data transmission networks are powered exclusively by renewable sources. The variability of wind and solar sources may not match a data centre’s demand profile, and the renewable energy may be purchased from projects in a different grid or region from where demand is located. Renewable energy certificates, in particular, are unlikely to lead to additional renewable energy production, resulting in uncertainty of real-world emissions mitigation.
Google and Microsoft have announced 2030 targets, and Iron Mountain a 2040 target, to source and match zero-carbon electricity on a 24/7 basis within each grid where demand is located. A growing number of organisations are working towards 24/7 carbon-free energy to match their electricity demand on an hourly basis, which could stimulate deployment of a wider portfolio of flexible technologies needed for net zero transitions in the power sector.

In contrast, many corporately owned (e.g. banks, insurance companies, retailers, etc) data centres are still powered with a supply from the local grid, with the energy mix being whatever the grid happens to supply.

Due to their high public visibility, there’s a strong incentive for global tech companies to be seen to be carbon neutral. The large scale and profits involved also make it easier for such companies to embark on ambitious renewable power initiatives.

There is no single answer to “what’s the best solution”, but clearly the nature of the energy that powers the data centre in which you deploy your ML solution will have a major impact on its CO2 emissions. It’s a topic worth exploring if you’d like to minimise your climate impact.

👉🏻 Please follow me on LinkedIn for updates on Generative AI 👈🏻