Reinforcement Learning : Its necessity and challenges

lemme learn ..


This is my first blog on reinforcement learning (RL). I’ve been learning and applying RL for last few months . I’m trying to present a high level understanding of RL and its implications through this post .

Before diving deep into the discussion, let’s first understand what’s reinforcement learning and how is it different from other machine learning techniques .

Reinforcement Learning (RL) is a subfield of machine learning where an agent learns by interacting with its environment, observing the results of these interactions and receiving a reward (positive or negative) accordingly. This way of learning mimics the fundamental way in which we humans learn.

Reinforcement learning system

Necessity and Challenges :

As we are moving towards Artificial General Intelligence(AGI) , designing a system which can solve multiple tasks (i.e Classifying an image , playing a game ..) is really challenging . The current scope of machine learning techniques , be it supervised or unsupervised learning are good at dealing with one task at any instant . This limits the scope of AI to achieve the generality .

To achieve AGI ,the goal of RL to make the agent perform many different type of tasks, rather than specializing in just one . This can be achieved by multi task learning and remembering the learning .

We’ve seen recent work of Google Deep Mind on multi task learning ,where the agent learns to recognize a digit and playing Atari . However this is really a very challenging task when you scale the process . It requires a lot of training time and huge number of iterations to learn tasks .

Another challenge comes the way agent perceives the environment . In many real world tasks agent does not have the scope to observe the complete environment .This partial observations make the agent to take the best action not just from current observation , also from the past observations .So remembering the past states and taking the best action w.r.t current observation is key for RL to succeed in solving real world problems .

RL agents always learn from exploration and exploitation .RL is a continuous trial-and-error based learning , where agent tries to apply different combination of actions on a state to find the highest cumulative reward .The exploration becomes nearly impossible in real world . Let us consider an example where you want to make the robot learn to navigate in complex environment avoiding collisions . As the robot moves around the environment to learn, it’ll explore new states and takes different actions to navigate .However it is not feasible to take best actions in real world where the dynamics of the environment changes very frequently and becomes very expensive for the robot to learn .

So to avoid the above problem,different other mechanisms have been applied on RL agents to make it learn. Few approaches like learning by mimicking the desired behavior, learning through demonstrations are being tried on robots to learn the environment in simulations .However in this way the learning becomes very specific to the environment and it looses the actual goal of generalized learning .

There are few positive developments happened in last few months from Open AI and DeepMind to achieve Artificial General Intelligence .One of the recent developement from Open AI is Evolution Strategies (ES) an optimization technique to overcome many RL shortcomings ( ) .Another such developement from DeepMind is PathNet is a new Modular Deep Learning (DL) architecture (

Any product that needs a manual to work is broken. Elon Musk


I would like to thank David Silver, Google DeepMind for his tutorials on youtube , Denny Britz , Google Brain and Arthur Juliani .