A Path to Responsible AI: Making LLMs More Environmentally Sustainable

Roxanne Boehlé
Sopra Steria NL Data & AI
6 min readSep 12, 2024
Photo by Bozhin Karaivanov on Unsplash

The rise of Large Language Models (LLMs) like GPT has proven that this technology is here to stay. While much of the conversation around responsible AI usage rightly focuses on human rights, there is another crucial aspect to consider: its environmental impact. In this blog post, I would like to explore what can be done to make LLM implementations more environmentally friendly.

What environmental impact are we talking about?

To understand the environmental impact of software development, we must consider the growing energy demands of data centers. These facilities are essential for hosting digital services and have seen internet traffic increase significantly in recent years. In 2022, global data center electricity consumption reached approximately 240–340 TWh, accounting for 1–1.3% of global electricity use and contributing to 0.3% of global carbon emissions. In comparison, the Netherlands has an annual electricity consumption of approximately 120 TWh.

The story doesn’t end with energy. Data centers also require significant amounts of water for cooling servers, especially in large, energy-intensive facilities. For example: Google’s data centers alone consumed nearly 20 billion liters of water for onsite cooling in 2022, the majority of which was potable water, and this water cannot be used for other purposes afterwards.

Given the significant role data centers play in energy and water consumption, a pressing question arises: how much of these resources are consumed specifically by LLMs? Within the AI landscape, LLMs like GPT are particularly resource-intensive. Training a single large model can require thousands of GPUs or TPUs, resulting in weeks or even months of continuous computation. Microsoft initially thought to decrease energy consumption, but after the LLM hype entered the public, they use even more energy.

The conversation about responsible AI use should include environmental impact while debating about, and taking responsibility over, ethics and human rights. While AI and LLMs offer undeniable benefits, we must strike a balance to ensure these technologies are deployed sustainably. This could mean investing in greener data centers, and raising awareness about the environmental footprint of our digital innovations. In this blog post I would like to consider sustainability from the point of view of a developer: can we make implementations of LLMs more energy-efficient?

What options for sustainable implementations are there to consider?

First, you should explore whether the solution you are looking for your problem, should involve LLMs. A good post for deciding whether you should use LLMs can be found here.

LLMs can generate, summarize, and rewrite text, but you can also search for information in a large amount of data. This is a common use for LLMs. Let’s take this use case for considering a sustainable implementation.

Every time we perform a query on a search engine, there’s the question of whether a simple search will suffice or if a language model like an LLM can better support the task. However, placing this decision on the user raises questions about responsibility and awareness. Not every user has the necessary knowledge, motivation, or interest to make this judgment every time. The question also arises whether the user should be responsible for making this decision, instead of the organization that provides the search engine. This highlights the need for technical solutions that ease the burden on the user. A resource-efficient algorithm, or a hybrid solution, that automatically determines whether an LLM is necessary, can ensure responsible use of these powerful yet energy-intensive technologies. This approach could not only reduce the load on data centers but also enhance the user experience by helping them make more sustainable choices without even realizing it.

Hybrid solutions ensure that only complex tasks are sent to LLMs, while simpler queries are handled by less resource-intensive methods. In this research, they propose a hybrid inference approach which combines their respective strengths to save cost and maintain quality, which led to to 40% fewer calls to the large model, with no drop in response quality. They train a router that takes a large model and a small model as input, and learns to identify these easy queries as a function of the desired level of response quality, while taking into account the generative nature of tasks.

Such hybrid models can also include a more environmentally friendly LLM. Research into resource-efficient methods for training and using large language models (LLMs) has focused on several areas. A key approach is optimizing the training process through techniques like model pruning and quantization, which reduce model size without significantly compromising performance. This lowers both energy and water requirements.

Model pruning is an approach that drop a subset of network weights while striving to preserve performance. In this repo, researchers show how large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. Besides the pruning approach, they also apply quantization; which is a compression technique that involves mapping high precision values to a lower precision one which can be realized for LLMs by modifying by representing numbers with fewer bits. Quantization reduces memory footprint and improves inference speed the precision of their weights and activations. This makes it less memory intensive. Doing quantization and pruning jointly means that later pruning decisions are influenced by earlier quantization rounding, and vice-versa.

There’s also emphasis on more efficient management of inference spaces, where models are run. Researchers are exploring model distillation, where a smaller model is trained to mimic the output of a larger one, leading to faster and more energy-efficient inferences. At this repo, you can find multiple ways to implement model distillation, and what factors to consider, where they highlight the importance of clearly defined criteria that align with your specific application’s needs in order to work well as a “student model”.

What other sustainable approaches cannot be programmed, but are still worth considering?

Additionally, you might explore how organizations can collaborate more effectively to use shared AI models, leading to cost and resource savings. The concept of model sharing or “model as a service” involves multiple organizations utilizing a centrally trained model rather than each training their own. This idea not only has economic benefits but also ecological ones, as repeatedly training similar models consumes a lot of energy. The scale benefits of jointly using a single model arise because the energy-intensive training process needs to be done only once, rather than by every individual organization.

Jointly managing inference spaces and employing federated learning — where different organizations collaborate to improve a model without sharing data — can also reduce the need for additional training.

Finally, innovative constructions, such as dynamic scaling of computational power and the use of energy-efficient hardware like specialized chips, are promising in reducing AI’s ecological footprint without sacrificing performance.

I hope this blog post has inspired you to consider sustainable LLMs and to make environmentally responsible choices when working with these models.

--

--