PyIDF: Diversity of experiences in Reinforcement Learning

Kostya Kanishev
Imandra
Published in
11 min readSep 13, 2019

In Reinforcement Learning, an agent learns from interaction with its environment. The agent’s goal is to arrive at a behavior policy that maximizes the expected reward. Intuitively, an agent should learn from a diverse set of experiences, but how one can quantify and ensure this “diversity”? At Imandra, we’ve developed a novel technique for exploring algorithm state-spaces called Region Decomposition. Our previous post focused on the details of Region Decompositions. Here, we will describe our PyIDF framework for decomposing state transition algorithms written in Python and demonstrate, with a simple example, how to use the resulting regions to improve the convergence and stability of Reinforcement Learning.

The Intuition

When training a Machine Learning classifier, one must be careful in order not to have class-imbalanced training datasets. Let’s illustrate this with a classic MNIST learning example (you can follow it in this Google Colab notebook). We’ll train two neural network classifiers with the same layer architecture. The first classifier will be trained with the original class-balanced MNIST dataset (left plot below), while for the second classifier we’ll use a training dataset that has one class — the “9” — overrepresented (right plot):

At each training epoch, both classifiers are evaluated on the same balanced test set. As one can see, the test performance of the second classifier degrades significantly — the neural network gets overtrained on the overrepresented class and does not generalize well to other digits.

Turning to Reinforcement Learning (RL) one might ask a similar question: how can we make sure that the experiences that an agent undergoes are “diverse” enough? What does it mean for a set of experiences to be “balanced” in the context of RL? How can we quantify and measure “diversity” of experiences?

Iterative Decomposition Framework (IDF)