Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Temporal Differences with Python: First Sample-Based Reinforcement Learning Algorithm

13 min readJan 27, 2023

--

Photo by Kurt Cotoaga on Unsplash

This is a continuation article from my previous article:

In this article, I want to familiarize the reader with the sample-based algorithm logic in Reinforcement Learning (RL). To do this, we will create a grid world with holes (much like the one in the thumbnail) and let our agent freely traverse our created world.

Hopefully, by the end of the agent's journey, he will have learnt where in the world is a good place to be and which locations should be avoided. To help our agent in the learning process we will use the famous TD(0) algorithm.

Before diving into the algorithms, let us define the objective that we want to solve.

In this article, we will create a grid world with 5 rows and 7 columns, meaning, our agent will be able to be in one of 35 states. The rules of movement are:

  • The agent cannot go outside the…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Eligijus Bujokas
Eligijus Bujokas

Written by Eligijus Bujokas

A person who tries to understand the world through data and equations