Monte Carlo Methods in Reinforcement Learning — Part 2 off-policy Methods

Sebastian Dittert
Analytics Vidhya
Published in
6 min readApr 29, 2020

--

This article is a continuation of the previous article, which was on-policy Monte Carlo methods. In this article the off-policy Monte Carlo methods will be presented.

The following topics are covered in the article:

  • off-policy Monte Carlo prediction
  • off-policy Monte Carlo control
  • importance sampling
  • policy coverage

As a small recap …

  • Monte Carlo Methods sample and average returns for each state-action pair.
  • On-policy methods attempt to evaluate or improve the policy that is used to make decisions.
  • Off-policy methods evaluate or improve a policy different from that used to generate the data.

Off-Policy Monte Carlo Prediction

There is one dilemma that all learning control methods face, which is, that they all seek to learn action values conditional on subsequent optimal behavior. Still they need non-optimally behavior in order to explore all actions to finally find the optimal actions.

On-policy methods as described in the previous article are a compromise. They learn action values, not for a optimal policy, but a…

--

--

Sebastian Dittert
Analytics Vidhya

Ph.D. student at UPF Barcelona for Deep Reinforcement Learning