Monte Carlo Methods in Reinforcement Learning — Part 2 off-policy Methods

Published in

Analytics Vidhya

6 min readApr 29, 2020

This article is a continuation of the previous article, which was on-policy Monte Carlo methods. In this article the off-policy Monte Carlo methods will be presented.

The following topics are covered in the article:

off-policy Monte Carlo prediction
off-policy Monte Carlo control
importance sampling
policy coverage

As a small recap …

Monte Carlo Methods sample and average returns for each state-action pair.
On-policy methods attempt to evaluate or improve the policy that is used to make decisions.
Off-policy methods evaluate or improve a policy different from that used to generate the data.

Off-Policy Monte Carlo Prediction

There is one dilemma that all learning control methods face, which is, that they all seek to learn action values conditional on subsequent optimal behavior. Still they need non-optimally behavior in order to explore all actions to finally find the optimal actions.

On-policy methods as described in the previous article are a compromise. They learn action values, not for a optimal policy, but a…

Monte Carlo Methods in Reinforcement Learning — Part 2 off-policy Methods

Off-Policy Monte Carlo Prediction

Written by Sebastian Dittert