Practical Reinforcement Learning — 01 Introduction to RL

Get started with hands-on reinforcement learning in your browser, no installations :)

Shreyas Gite
3 min readApr 2, 2017
source: wired

Reinforcement learning is been around for decades in computer science and even longer in behavioral psychology (many rats and pigeons will corroborate it). However, the shit got real when DQN and AlfaGo, deep reinforcement learning based algorithms from DeepMind beat human champions at Atari games(2015) and Go(2016). This was a pretty big deal because until then pandits were not expecting such performances from the computers for next 10–15 years. It sent massive shockwaves through the tech industry.

Soon after, Sam Altman and Elon Musk cofounded OpenAI to balance out the concentration of AI resources.

Sam and Elon (source: backchannel.com)

So far, OpenAI has two platforms, Gym and Universe for developing and comparing reinforcement learning algorithms.
And that’s not it, recently reinforcement learning was also declared as one of the ten top breakthrough technologies for 2017.

So lets get to it!

What is Reinforcement Learning?

It’ an art of learning, where you learn by a feedback you get for performing a certain action.

Skinner Box

Consider this classical example. There is a box, a rat is put in the box, the box has a lever and a food dispenser. On pressing the lever food is given to the rat via food dispenser.
1. At first, the rat accidentally presses the lever and gets the food.
2. As it happens multiple times, the rat learns that pressing the lever brings the food.
Hence the act of pressing the lever is positively reinforced by providing the food.

Reinforcement learning is an interactive form of learning between an agent (rat) and its environment (Box). The agent (rat) is provided with information about its environment (lever position and food dispenser). The agent then learns to act, without explicitly being told what to do. It discovers by itself the desirable actions (pressing the lever) from reward obtained (food) for trying those actions.
The only goal for the agent is to maximize the reward it gets.

Let’s go over a basic reinforcement learning terminology:

Reinforcement Learning loop (source: nervana systems)

The state gives information about the current situation of agent’s environment. For example, say delivery drones, it’s state will be speed and direction of drone, data from GPS sensor etc. Or if you are playing a Pac-Man, then it will be pixel values of the screen.

The action is the Agent’s response to the environment. If we go back to drones, it will be a signal to drone’s propellers making it go forward, sideways etc. Or if we talk about the Pac-Man, it will be left, right, up and down actions.

The reward is a feedback signal from the environment. It tells the agent if it’s doing a good job or not. Agent’s primary objective is to maximize the total reward. e.g.
Delivery drones
+ve reward for reaching the destination and - ve reward for crashing.
Playing Pac-Man
+ve reward for scoring and - ve reward for losing points

Now we will solidify our concepts of state, action and reward, with OpenAI gym. Play with it in your browser using azure notebooks. No installations, downloads etc. necessary!

I hope this tutorial has been helpful to those new to Q-learning and TD-reinforcement learning!

If you’d like to follow my writing on Reinforcement Learning, follow me on Medium Shreyas Gite, or on twitter @shreyasgite.
Feel free to write to me for any questions or suggestions :)

Now let’s continue on to Q-Learning

--

--