Reinforcement Learning: Temporal Difference Learning — Part 1

Sebastian Dittert
Analytics Vidhya
Published in
8 min readMay 18, 2020

--

Since the last articles we have moved from theory more and more into practice. The last two articles of Monte Carlo methods were used to solve the prediction problem and the control problem in reinforcement learning.

Following up on the Monte Carlo Methods, in this article we will look at another method called Temporal Difference (TD) Learning.

TD learning is a central and novel idea of reinforcement learning. It can be seen as a combination of the other two core elements Monte Carlo Methods (MC) and Dynamic Programming (DP).

Like Monte Carlo Methods, TD can learn from raw experience without knowledge of the environment. Thereby TD updates estimates based in part on other learned estimates, without waiting for the final outcome, like MC methods.

Just like Monte Carlo Methods, TD Methods are discussed in two articles.

In the first part, I want to cover the TD prediction problem, TD error, the advantages of the TD prediction, and the Optimality of TD(0).

So then, let's start with the prediction problem …

TD for the prediction problem

TD learning uses experience to solve the prediction problem, just like Monte Carlo Methods. Both learning methods…

--

--

Sebastian Dittert
Analytics Vidhya

Ph.D. student at UPF Barcelona for Deep Reinforcement Learning