Deep Learning — Reinforcement Learning

Renu Khandelwal
Analytics Vidhya
Published in
7 min readDec 28, 2018

--

Interested in understanding the algorithm used by AlphaGo to beat the human world champion? Then this article is for you.

We will discuss what is Reinforcement learning (RL), Elements of Reinforced Learning, terms related to RL like value function, and Q value function. What is Optimum policy? and how can we find the optimum policy? RL trade-off between exploitation and exploration.

As kids, teenagers or grownups, when we need to learn a new skill, we either have someone to help or we learn on our own by trial and error.

  • Remember the first time we started to learn to bike. We were penalized for every mistake by falling down from the bike and getting hurt. Based on the mistake we made, we understood the actions we need to take to correct. Every time we made a correct move, our confidence went up. This was the reward for the right action. When we have mastered the skill, we were now equipped to take corrective actions based on different states that we may encounter when biking from point A to point B.

Let’s map this to Reinforced Learning.

  • Reinforcement learning is mapping situations or states to actions in order to maximize the reward.
  • A learner is not told what action to take. Rather, a learner needs to figure out the action that would yield the

--

--

Renu Khandelwal
Analytics Vidhya

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!