Weekly review of Reinforcement Learning papers #3

Every Monday, I present 4 publications from my research area. Let’s discuss them!

7 min readApr 5, 2021

Paper 1: Synthetic Returns for Long-Term Credit Assignment

Raposo, D., Ritter, S., Santoro, A., Wayne, G., Weber, T., Botvinick, M., van Hasselt H. & Song, F. (2021). Synthetic Returns for Long-Term Credit Assignment. arXiv preprint arXiv:2102.12425.

Good actions produce high rewards. Besides, the principle of causality tells us that the cause always precedes the effect. Put that together: a good action is associated with a high reward in the future. A logician would answer: is the reciprocal true? Does a high reward imply that all preceding actions are good? No. And it is on this false assertion that all RL algorithms are based. To take the example of policy-based RL: the agent increases the probability of actions that precede (not cause!) high rewards.

The approach presented in this paper is quite different than what you usually see: state-associative learning (SA). The agent must learn an association between states and arbitrarily distant future rewards. The goal is to model the contribution of past states to the current reward. To go into a bit of algorithmic detail, the agent is augmented with a neural network that is trained to predict the reward for each time step. This neural network takes as input the set of visited states since the beginning of the episode. The output of the neural network is called the “synthetic return”. Intuitively, this neural network allows us to measure the reward attributable to the presence of the agent in the current state, regardless of how long it takes to reach the reward. It is this quantity that will be maximized by the agent during reinforcement learning.

Is it working ? Yes. Here is one of the last Atari game solved by Deep-RL, because of its delayed reward structure: Atari Skiing.

It is on this environment that their algorithm gave the best results. They were able to solve it with 25 times less interaction than the best agent in the literature: Agent 57.

Figure from the paper: Agent (IMPALA+SR) performance on *Atari Skiing*. The agent is IMPALA-based and augmented with the syntetic reward (SR).

I am curious to see the work that will use this agent in other environments. It is this kind of paradigm shift that leads to great advances.

Paper 2: Deep reinforcement learning in medical imaging: A literature review

Zhou, S. K., Le, H. N., Luu, K., Nguyen, H. V., & Ayache, N. (2021). Deep reinforcement learning in medical imaging: A literature review. arXiv preprint arXiv:2103.05115.

If we raise our eyes, we can see that the recent progress in our field has benefited our neighbors in medicine. This is still marginal, but papers linking medicine and Deep-RL are multiplying. The authors of this recent publication propose a survey of them.

The main application of Deep-RL is (curiously) medical imaging. Three categories are distinguished:
(i) parametric analysis of medical images
(ii) resolution of optimization tasks
(iii) others.
Let us give a brief representative example for the first category.

Maicas et al. (2017) present DRL-based approach for breast lesion detection from contrast-enhanced dynamic magnetic resonance imaging (DCE-MRI). This involves training an agent that must successively choose how to evolve the size and position of its bounding box. Conceptually, an action could be “move 20 pixels to the right and reduce the size of the bounding box by 7 pixels”. The observation space is the content of the bounding box and the reward: if the lesion is inside the box.

Using a ResNet-based DQN, the results show similar accuracy to the state-of-the-art approaches, but with significantly reduced detection time.

There are a lot of other interesting examples in this paper. I am glad to see the result of such an interdisciplinary approach.

Paper 3: Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings

Chen, L., Lee, K., Srinivas, A., & Abbeel, P. (2021). Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings. arXiv preprint arXiv:2103.02886.

We often start learning reinforcement learning with simple examples: in these simple examples, an observation = state; all coordinates in the observation vector are useful, and everything useful is in the observation vector. But in life things are not that simple. When we do image-based learning, not all pixels are useful, and not everything that is useful is in the pixels. This is the challenge of high-dimensional observation spaces.

Off-policy reinforcement learning algorithms often use a memory to replay the episodes. This is a trick that works very well in practice. However, replaying the episodes when the observation is an image is very storage and computationally expensive.

In this paper, the authors present a very attractive idea: instead of storing the raw images in the replay memory, they store the latent representations. The dimensionality of the stored data is much lower. The amount of computation is therefore reduced. However, a question remains: how to generate this latent representation? The solution is to start learning by learning the policy and the representation together. Once the representation is correctly learned, we can freeze it, and continue learning only on the latent representation.

Figure from the paper: (a) All forward and backward passes are active through the network. Replay buffer stores images. (b) Encoder is frozen, replay buffer stores latent vectors. (Published with the author’s permission.)

SEER stands for Stored Embeddings for Efficient Reinforcement Learning. Here are the results.

Figure from the article: Learning curves for Rainbow with and without SEER, where the x-axis shows estimated cumulative FLOPs. (Published with the author’s permission.)

You will notice that the x-axis gives the estimated number of FLOPS. This is quite unusual, but remember, the goal here is to reduce the computational requirements of the algorithm. From freezing, the model learns more efficiently (in terms of number of operations). Bet won!

I can’t develop here the part dedicated to the transfer of the learning process and the generalization of the lower layers of the CNN. I invite you to read this paper in its entirety.

Paper 4: A Survey on Deep Reinforcement Learningfor Audio-Based Applications

Latif, S., Cuayáhuitl, H., Pervez, F., Shamshad, F., Ali, H. S., & Cambria, E. (2021). A Survey on Deep Reinforcement Learning for Audio-Based Applications. arXiv preprint arXiv:2101.00240.

In one of its many applications, Deep-RL is used for audio systems: learning is done from sound, speech, music… any sound signal containing information. But audio is a very particular type of input. It takes its meaning when we study the evolution of the signal, its temporal and spectral dimension. We are used to CNN to capture the spatial continuity of images, audio applications remain more marginal.

This paper reviews the full range of Deep-RL papers for audio applications. The review is very detailed. There are several large tables that allow to have a very syntetic view of all the approaches present in the literature and their specificity. It is very easy to compare the application domain, the algortihms used, the results, the observation spaces, the form of the reward function…

Figure from the paper: summary of audio-based DRL connecting the application areas and algorithms (red: value-based, green: policy-based, blue: model-based)

The authors identify 5 types of applications: automatic speech recognition (ASR), speech emotion recognition (SER), spoken dialogue systems (SDSs), audio enhancement, audio-driven robotic control, and music generation. For each of these categories, there is a growing number of studies and the power of Deep-RL is demonstrated each time. Nevertheless, the authors identify a pitfall: the results are often obtained either in highly controlled or simulated environments. This does not reflect the complexity of the real world, which is very noisy, non-stationary, and surely full of other biases making learning less easy. Future research will surely address this problem already well known in robotics: the gap between simulation and reality.

There is no doubt that we will continue to observe the amazing applications of Deep-RL applied to audio.

Bonus Paper: Reinforcement learning: An introduction

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

Small exception this week, since it’s not an article but a book that I present. This is a book that everyone interested in Deep-RL should own.

Published in 2018 by Richard S. Sutton and Andrew G. Barto, this book explains step by step the basics of reinforcement learning. This book is widely cited because it freezes in time everything that was known in 2018 about reinforcement learning. It is a high quality book, well researched, well illustrated, covering the whole spectrum of difficulty, for beginners and experts alike.

The book starts with a chapter dedicated to tabular learning (the action and observation spaces are discrete and finite). Then the next chapter focuses on approximate solutions based on deep neural networks (we go to Deep-RL). It ends with some sections dedicated to the frontiers of the discipline: They discuss the similarities that can be found with neuroscience and psychology.

I decided to buy it in paper version, but a free version is available following site.

It was with great pleasure that I presented you my readings of the week. Feel free to send me your feedback.