Reinforcement Learning made easy

Filip Knyszewski
10 min readDec 25, 2018

Reinforcement learning is one of the most exciting branches of AI right now. It has allowed us to make major progress in areas like autonomous vehicles, robotics and video games. Perhaps its most famous achievement has been beating the world-champion Go player, a feat that many considered impossible before. Today we are going to look into two of the most famous reinforcement learning algorithms, SARSA and Q-learning and how they can be applied to a simple grid world maze like problem.

Markov Decision Process

To explain the context behind these algorithms, we need to talk in terms of a Markov Decision Process.

In simple terms, MDPs are just a formal way of modeling the world with a defined set of rules. In this world we introduce an entity called an agent. The agent can move around and interact with the map in whatever way the transition function allows it to. The agent’s only goal is always to maximize its reward. Think of it like humans trying to maximize their happiness in a particular environment (although usually we are not very good at it!). This is the type of model that can make reinforcement learning thrive as we will see later on. We are able to simulate episodes as many times as needed, and our agent can learn how to maximize its reward through experience.

Example of a MDP, a simple gridworld

--

--