Efficient Exploration in Model-free Reinforcement Learning

I describe here our recent NeurIPS paper [1] [code], which introduces Successor Uncertainties (SU), a state-of-the-art method for efficient exploration in model-free reinforcement learning.

The leading authors are David Janz and Jiri Hron, two PhD students from the Cambridge Machine Learning group, with the work having originated during an internship by David Janz at Microsoft Research Cambridge.

The main insight behind SU is to describe Q-functions using probabilistic models that directly take into account correlations between Q-function values, as given by the Bellman equation. These correlations had been ignored in previous work.

The result is a method that outperforms competitors

I describe here our recent ICLR paper [1] [code] [talk], which introduces a novel method for model-based reinforcement learning. The main author of this work is Stefan Depeweg, a phd student at Technical University in Munich who I am co-supervising.

The key contribution is in our models: Bayesian neural networks with random inputs, whose input layer contains both input features but also random variables which are propagated forward through the network and transformed into an arbitrary noise signal at the output layer.

The random inputs enable our models to automatically capture complex noise patterns, improving the quality of our model-based…

Deep neural networks are currently used by companies such as Facebook, Google, Apple, etc. for making predictions from massive amounts of user generated data. A few examples are the Deep Face and Deep Text systems used by Facebook for face recognition and text understanding or the speech recognition systems used by Siri and Google Now. In this type of applications, it is critical to use neural networks that make predictions that are both fast and accurate. This means that, when designing these systems, we would like to tune different neural network parameters to jointly minimize two objectives: 1) the prediction…

José Miguel Hernández Lobato

University Lecturer (US Assistant Professor) in Machine Learning at the University of Cambridge, UK.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store