Photo by Erik Stein

What if Quantization was Applied for Reinforcement Learning?

Christopher Dossman
AI³ | Theory, Practice, Business
2 min readOct 10, 2019


Reinforcement learning (RL) has reached a place where it can be applied to many real-world applications such as robotics, game playing, healthcare, transportation, finance and more.

But training and deploying reinforcement learning models still experience challenges. Because of computation demands such as repeatedly performing the forward and backward propagation in neural nets, training RL models is costly. According to OpenAI, achieving state of the art results in the game DOTA2 made possible by RL needed around 128,000 CPUs cores and 256 P100 GPUs with the total infrastructure costing tens of millions of US dollars.

To deal with this, quantization is used to substantially reduce the memory, compute, and energy usage of deep learning models without harming their quality. So, can quantization approaches be carried for reinforcement learning? Researchers set out to find out.

Quantized Reinforcement Learning (QUARL)

Researchers performed the first study of quantization effects on deep reinforcement learning using Quantized Reinforcement Learning (QUARL). QUARL is a new framework for benchmarking and analyzing the effects of quantization on various reinforcement learning tasks and algorithms.

They applied post-training quantization and quantization aware training techniques to a spectrum of reinforcement learning tasks and training algorithms. Their work is a comprehensive experimental study that quantifies the effects of quantization on various deep reinforcement learning policies expecting to reduce their computational resource demands.

Why it Matters?

They work demonstrates that policies can be quantized to 6–8 bits of precision without loss of accuracy. Another outcome is that certain tasks and RL algorithms yield policies that are more difficult to quantize due to their effect of widening the models’ distribution of weights and that quantization aware training consistently improves results over post-training quantization and oftentimes even over the full precision baseline.

The work demonstrates that real-world applications of quantization for reinforcement learning achieve a speedup and a reduction in memory usage over an unquantized policy.

Read the paper: Quantized Reinforcement Learning (QUARL)

Thanks for reading, please comment and share. For an update of the most recent and interesting research papers, subscribe to our weekly newsletter. You can also connect with me on Twitter, LinkedIn, and Facebook. Remember to 👏 if you enjoyed this article. Cheers!