Monte Carlo Methods in Reinforcement Learning — Part 1 on-policy Methods

Sebastian Dittert
Analytics Vidhya
Published in
8 min readApr 29, 2020

--

This and the following article are intended to continue the series on Reinforcement Learning and should bring the sequence of articles from initially very theoretical topics to more and more practical applications and algorithms. The last two articles dealt with the introduction of the Markov Decision Process and the explanation of policies and value functions.

In the next two articles I would like to explain Monte Carlo Methods (MC) and show how to use them for estimating Value Function and finding optimal policies. Thereby this article is an exact continuation of the previous article about value functions and (optimal) policies. Thus, the following two articles about MC will introduce the first learning methods in the series of previous articles.

In short, this article covers the following areas:

  • What is on-policy / off-policy Monte Carlo
  • on-policy Monte Carlo Prediction
  • on-policy Monte Carlo Control

As well, all mentioned Algorithms in this article are implemented and for you, the reader, accessible. I created a notebook on GitHub so that the reader gets more insights about the methods, explore them deeper and follow the explanations in this article more practically.

--

--

Sebastian Dittert
Analytics Vidhya

Ph.D. student at UPF Barcelona for Deep Reinforcement Learning