Monte Carlo Methods in Reinforcement Learning — Part 1 on-policy Methods

Published in

Analytics Vidhya

8 min readApr 29, 2020

This and the following article are intended to continue the series on Reinforcement Learning and should bring the sequence of articles from initially very theoretical topics to more and more practical applications and algorithms. The last two articles dealt with the introduction of the Markov Decision Process and the explanation of policies and value functions.

In the next two articles I would like to explain Monte Carlo Methods (MC) and show how to use them for estimating Value Function and finding optimal policies. Thereby this article is an exact continuation of the previous article about value functions and (optimal) policies. Thus, the following two articles about MC will introduce the first learning methods in the series of previous articles.

In short, this article covers the following areas:

What is on-policy / off-policy Monte Carlo
on-policy Monte Carlo Prediction
on-policy Monte Carlo Control

As well, all mentioned Algorithms in this article are implemented and for you, the reader, accessible. I created a notebook on GitHub so that the reader gets more insights about the methods, explore them deeper and follow the explanations in this article more practically.

Monte Carlo Methods in Reinforcement Learning — Part 1 on-policy Methods

Written by Sebastian Dittert