Reinforcement Learning: Temporal Difference Learning — Part 1

Published in

Analytics Vidhya

8 min readMay 18, 2020

Since the last articles we have moved from theory more and more into practice. The last two articles of Monte Carlo methods were used to solve the prediction problem and the control problem in reinforcement learning.

Following up on the Monte Carlo Methods, in this article we will look at another method called Temporal Difference (TD) Learning.

TD learning is a central and novel idea of reinforcement learning. It can be seen as a combination of the other two core elements Monte Carlo Methods (MC) and Dynamic Programming (DP).

Like Monte Carlo Methods, TD can learn from raw experience without knowledge of the environment. Thereby TD updates estimates based in part on other learned estimates, without waiting for the final outcome, like MC methods.

Just like Monte Carlo Methods, TD Methods are discussed in two articles.

In the first part, I want to cover the TD prediction problem, TD error, the advantages of the TD prediction, and the Optimality of TD(0).

So then, let's start with the prediction problem …

TD for the prediction problem

TD learning uses experience to solve the prediction problem, just like Monte Carlo Methods. Both learning methods…

Reinforcement Learning: Temporal Difference Learning — Part 1

TD for the prediction problem

Written by Sebastian Dittert