Member-only story
Temporal Differences with Python: First Sample-Based Reinforcement Learning Algorithm
Coding up and understanding the TD(0) algorithm using Python
This is a continuation article from my previous article:
In this article, I want to familiarize the reader with the sample-based algorithm logic in Reinforcement Learning (RL). To do this, we will create a grid world with holes (much like the one in the thumbnail) and let our agent freely traverse our created world.
Hopefully, by the end of the agent's journey, he will have learnt where in the world is a good place to be and which locations should be avoided. To help our agent in the learning process we will use the famous TD(0) algorithm.
Before diving into the algorithms, let us define the objective that we want to solve.
In this article, we will create a grid world with 5 rows and 7 columns, meaning, our agent will be able to be in one of 35 states. The rules of movement are:
- The agent cannot go outside the…