Green AI

Noa Lubin
Voice Tech Podcast
Published in
3 min readFeb 2, 2020

Green AI — How the financial and environmental costs of NLP models could change the future of the NLP community.

Introduction

The recent improvements in NLP models obtained notable improvements in accuracy. However, these improvements are not “free”, they require heavy computational resources. They are costly and consume a significant amount of energy. In the paper, “Energy and Policy Considerations for Deep Learning in NLP” by Emma Strubell et Al. [1] there is an attempt to quantify the financial and environmental costs of such models. (I previously mentioned this paper in my SpacyIRL post.)

picture from freepick.com

In the paper, in order to measure the energy, Emma Strubell et Al. [1] trained the state-of-the-art NLP models using the default settings provided, and sample GPU and CPU power consumption during training. They trained each model for one day and multiplied the daily consumption by the number of days the total expected time the model’s paper reported. In addition, the total consumption was multiplied by the Power Usage Effectiveness (PUE) coefficient, which accounts for additional energy needed.

In order for you to get the feeling of the numbers of energy consumption the paper listed the followings:

familiar CO2 consumption [1]

So yes, training a BERT model is equivalent to air travel from NY to SF.

Estimated CO2 emission and cloud compute cost [1]

Aside from the disturbing numbers, I think as an NLP community this raises interesting questions.

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

As NLP researchers, this idea has an impact on the direction of NLP research. Each year we increase the number of parameters, have more complex architectures and require more and more data to train. I believe the NLP research future will focus on training with much smaller datasets and use simpler solutions to obtain similar results.

An interesting paper that tries to do so is “Distilling the Knowledge in a Neural Network” by Hinton et Al. [2] The idea is simple, take the full output of a large network such as BERT, and train a small network to predict the full vector output. The small network will learn to mimic the large one. The results are surprisingly good, getting pretty close to the large networks.

As NLP applied scientists, this emphasizes we should always think about this important trade-offs while modeling solutions. If I am building a solution based on an English to German machine translation model and NAS achieves a new stateof-the-art BLEU score of 29.7 for it, an increase of just 0.1 is it worth it? The answer is obviously no.

To sum up, the current financial and environmental costs of NLP models is very high. I hope as a community the focus on using and finding simple, elegant, small-data and ecological solutions will become just as important as using and creating the “next BERT”.

Some food for thought!

Until next time,
Noa Lubin.

References

[1] Energy and Policy Considerations for Deep Learning in NLP, Emma Strubell, Ananya Ganesh, Andrew McCallum

[2] Distilling the Knowledge in a Neural Network, Geoffrey Hinton, Oriol Vinyals, Jeff Dean

Something just for you

--

--

Noa Lubin
Voice Tech Podcast

data science manager, AI researcher, space enthusiast and social entrepreneur. I hope this blog helps you navigate your way into the incredible world of AI.