Training a state-of-the-art artificial intelligence (AI) algorithm can emit five times more CO2 than a car would in its lifetime . As achieving state-of-the-art dictates whether a paper is successful, researchers are developing programs that are more and more computationally-hungry to reap higher scores. Let’s have a look at the consumer headlines about AI: beating humans at Go, at StarCraft II or at Dota2? However polluting a car may be, it is nonetheless useful. Can the same be said of the algorithms developed to play Go or strategy video games?
Against all odds, computer playing games might just be a great ally in the fight against climate change: pun aside, games provide toy versions of complex real-life problems  and reinforcement learning (RL) — the area of AI that these examples mainly deal with — is a powerful tool to solve them.
Powerful indeed, but — as we said — consuming a mastodontic amount of energy . In this blog post we explore a way to make RL algorithms more lightweight, thriftier, taking a closer look at one algorithm in particular: model-free episodic control .
Hippocampal learning with ones and zeros
A key challenge in RL is to balance exploration and exploitation. Exploration is the strategy of blindly trying some action to see if it might lead to better scores in the long run. Exploitation instead consists in the greedy reuse of actions that are known to yield high rewards.
In model-free episodic control, the idea during exploitation is to match the current observed state against the record of previous attempts and choose the action that served us best in the past. We keep track of a value for each encountered state-action pair that renders their potential to generate high rewards. In order to opt for the proper move, we look in our record for the nearest neighbors of the observed state and check which associated action was most interesting.
The cumbersome essence of high dimensions, evaporated…
Weighing the differences between these observations is not the easiest thing to do. Let us forget the toy problem for a moment and think of Elon Musk’s promise. Imagine we would like to use that same algorithm with the data provided by the 7 cameras, 12 ultrasonic sensors and the radar on a Tesla. Two issues arise: memory and computation requirements are over the roof!
There is a trick to solve them both: we can embed the states into a low-dimensional space without altering their geometric properties using a random map .
Moreover, random projections strengthen the algorithm’s independence to fallible human intuition. We end up with a more universal method.
…under a beam of light
A LightOn Optical Processing Unit (OPU) can extract a custom number of features from very high-dimensional vectors with a constant low power consumption. It thus fits well in the model-free episodic control algorithm as a surrogate for the traditional GPU-based linear random projection, whose cost soars with the size of the observations.
Relationship with convolutions
A typical way of extracting features from images is using convolutional neural networks (CNN). Let’s say we use the “convolutional” part of the CNN, as not to spend time training the classification layers of such a network. A simple architecture yields similar results. However, processing the observations with CNNs consumes a lot more energy than the OPU for each input image .
We can improve the algorithm by combining both approaches: a rough reshaping of the input images to even out the useless details followed by a random projection bears the best results with little to no overhead. Or contrarily, we can make case-specific enhancements to the preprocessing to easily build a finer feature distiller over (actually, before) the generic framework that is the random projection.
Model-free episodic control is not the panacea of RL. It is actually a rather modest algorithm, but a good starting point to understand the success of neuro-inspired episodic control methods . These have been shown to outperform deep RL algorithms at least in the first stage of learning (see Figure 3). We therefore have the opportunity to devise robust AIs (possibly with the aid of imitation learning to combine episodic control with another agent), leveraging the properties of RP and light-based computing to address one of deep learning’s major flaw: using a sample-efficient algorithm early on, such as episodic control, reduces the data-hunger often associated with pure-RL techniques, which in turn brings down the electricity bill.
Have a look at the Github repository to see the implementation details and reproduce our results. LightOn supports research through the LightOn Cloud for Research program. This program allows you to get free credits to speed up your computations. Apply here!
LightOn is a hardware company that develops new optical processors that considerably speed up Machine Learning computation. LightOn’s processors open new horizons in computing and engineering fields that are facing computational limits. Interested in speeding your computations up? Try out our solution on LightOn Cloud! 🌈
 Strubell, Emma, Ananya Ganesh, and Andrew McCallum. “Energy and policy considerations for deep learning in NLP.” arXiv preprint arXiv:1906.02243 (2019).
 Risi, Sebastian, and Mike Preuss. “From Chess and Atari to StarCraft and Beyond: How Game AI is Driving the World of AI.” KI-Künstliche Intelligenz 34.1 (2020): 7–17.
 Schwartz, R., J. Dodge, and N. A. Smith. “Green AI.” arXiv preprint arXiv:1907.10597 (2019).
 Blundell, Charles, et al. “Model-free episodic control.” arXiv preprint arXiv:1606.04460 (2016).
 Johnson, William B., and Joram Lindenstrauss. “Extensions of Lipschitz mappings into a Hilbert space.” Contemporary mathematics 26.189–206 (1984): 1.
 Lacoste, Alexandre, et al. “Quantifying the Carbon Emissions of Machine Learning.” arXiv preprint arXiv:1910.09700 (2019).
 Pritzel, Alexander, et al. “Neural episodic control.” Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.