Reinforcement Learning in layman terms.

Paarvendhan
4 min readJan 28, 2018

--

An attempt to understand Reinforcement Learning, the first step towards general AI with the help of an Italian Plumber.

Reinforcement Learning is a part of Machine Learning which mainly focuses on making models learn from mistakes. Just like us humans while playing a game we make mistakes, we learn from it and we adapt. In fact, that’s how we actually learn all things. Reinforcement Learning uses the same technique to train it’s brain(model).

Say we are put in a new Environment, At first we’ll make mistakes/fails at some points but we’ll learn from it. So that, when the same situation should arise we’ll not make the same mistakes again.

Environment -> Try and Fail -> Eventually Learn-> Reach Goal.

Let’s get some really basic idea about what’s going on without any Math or Code:

Step 1 : Define your environment and set your actions and goals.

E.g.
Environment: Super Mario.(The Best game ever!)
Actions: Move forward, jump, duck, long jump etc.
Goal: Retrieving the princess.
Mario City

Note: enemy tortoises and triangle shaped things are removed from the scenario for simplicity.

Step 2 : Initialize Q table with states and actions.
Q table AKA Quality table represents the quality of move that is being made on that state.

Higher Magnitude -> Higher Quality Move in a state.

States - Current state or Position in the environment.
E.g. Current location of Mario in the frame.
Action - List freedom of movements in the environment that is  defined.
E.g. Jump, Forward.

Like this one :

Ofcourse, NOT TO SCALE!!!

Now our job is to train and adapt the above Q table by interacting with the environment in following steps

Step 3 : Let the hero explore the environment.

Our hero can take a random move if Q table's Move is zero or equally distributed.
Else hero has to choose the move with highest reward for the present state.
For a given State:
if Jump > Forward:
Mario chooses to Jump.
else:
Mario chooses Forward.
What kind of sorcery is this?

Step 4 : Update the Q table.

Now Rewards for each move towards the goal is calculated and updated in Q table.
It is specific to that State and Move at that instant.
Q table( State , move) = Q table( State , move) + learning_Rate *[Q table(current(S,M) - previous(S,M))]
learning_rate = 0.1 # one step at a time.
S - State , M - Move.

An updated Q table after some movements:

Step 5 : Handling Fail conditions.

If our hero fails to reach the goal, Update Q table with a negative reward.
Negative rewarding a Move at that State reduces the selection of that movement in future for that state.
He also missed hidden 1UP !

Step 6 : Reaching the goal.

The above process is continued untill our hero reaches the goal.
Once the Goal is reached, Our program has completed a generation.
Victory.

Step 7 : Passing Knowledge to Generations.

Once a generation is complete, the game is started again.
But the same Q table is kept, in order to have knowledge of the previous generations.
The Steps 3 - 6 is repeated again and again till Saturation or till enough experience in large cases.

Finally, we got our updated Q table with enough knowledge of the environment. This Q table can be used to successfully complete Super Mario with much ease.

SAVAGE.

Nostalgia, ?

Down below are some programs written in Python to demonstrates the above steps lively. GitHub hyperlinks are provided. Program itself is self-explanatory and easily understandable.

Implementation :

Grid Pathfinder & Grid Pathfinder 3D:

Prerequisite :

  • Pandas
  • Numpy
  • Matplotlib

A program to navigate through a grid even with blocks in the grid. Code segments are explained in commented lines. Qlearning table is used. Feel free to experiment with the variables.

Actual Recorded output.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

A Beautiful lie!

I think he is destined to do this forever, he and the Player.

Everybody Knows…

!!

--

--

Paarvendhan

Computer Vision & Deep Learning Developer, Ex-Udacity AI Mentor, Graduate at ASU, LinkedIn https://www.linkedin.com/in/ipaar/, https://perseus784.github.io/.