n-step Bootstrapping in Reinforcement Learning

Introduction

Shivam Mohan
7 min readJan 29, 2023

In this article, we will discuss an approach to reinforcement learning that unifies the Monte Carlo (MC) methods and the one-step temporal-difference (TD) methods. We will discuss the n-step TD methods that generalize both methods so that we can shift from one to the other smoothly as needed to meet the demands of a particular task. n-step methods span a spectrum with MC methods at one end and one-step TD methods at the other. The best methods are often intermediate between the two extremes.

n-step methods unlike one-step TD frees us from the tyranny of the time step. With one-step TD methods, the same time step determines how often the action can be changed and the time interval over which bootstrapping is done. In many applications, one wants to be able to update the action very fast to take into account anything that has changed, but bootstrapping works best if it is over a length of time in which a significant and recognizable state change has occurred. With one-step TD methods, these time intervals are the same, and so a compromise must be made. n-step methods enable bootstrapping to occur over multiple steps, freeing us from the tyranny of the single-time step.

n-step TD Prediction

Prediction refers to the problem of estimating the values of states, a value of a state is an indication of how good is that state for an agent in the given environment, the higher the value of the state the better it is to be in that…

--

--