Reinforcement Learning — What, Why, and How.

Vishal Garg
Analytics Vidhya
Published in
7 min readJun 11, 2020

When it comes to machine learning types and methods, Reinforcement Learning holds a unique and special place. It is the third type of machine learning which in general terms can be stated as “Learning from experience”.

In this article, I will try to answer three basic but important questions pertaining to Reinforcement Learning.

What is Reinforcement Learning?

Let’s see a bit in detail and for this, I would like to uncover three aspects starting with Definitions, a piece of intuition and then I will talk about in brief about the basic elements and characteristics of an RL agent.

Definitions

Well, there are various definitions of reinforcement learning around us, for instance, it says, reinforcement learning is learning what to do, how to map situations to actions so as to maximize a reward signal.

Another version that is a bit more elaborative says reinforcement learning is an ML technique that involves an agent acting in an environment by choosing predefined actions with the goal of maximizing a numerical reward.

Now from all sorts of definitions, we can have these keywords that kind of defines the Reinforcement Learning, that it is an ML Type, It involves an agent interacting in an environment, sensing states, taking actions and then getting rewards for the actions taken by the environment and with a goal to maximize the reward. So where is the learning in all of this?

Well, learning is the Policy, let’s see with an example of this definition in action.

For instance, here in the following example, there is an agent (this robot over here) that can observe using its sensors and then based on a policy chose action and touches fire. By taking this action, it gets a negative reward from the environment which the agent learns and hence updates its policy.

The agent now knows, that in this particular state it needs to avoid choosing the fire and hence selects other action. This process of reinforcing its learning through the experience continues over the number of iterations which are called episodes until the agent learns an optimal policy.

Now, let’s talk about the intuition behind the use of Reinforcement Learning.

Well, in general, if we see the four perspectives that define the relevance of AI for business enterprises. That starts with Perception, which relates to getting data, Big/Large/Streaming/Batch etch, run analytics on it such as Reports, etc.

And then comes inference, where we have Statistical modeling, Machin learning, deep learning coming in. So, if we see at present, most of the industry is focusing on these two i.e. Perception & Inference since these are most mature as of now.

However the other & forward-looking perspectives are Decision making & Action taking, and there we see Reinforcement Learning falls into the Decision Making class.

Just imagine, when enterprises shall be enabled to make decisions and not just get insights (inferential ones from the data), the whole turn around time, efficiencies, productivity will go multifold.

And that’s where the power of Reinforcement Learning lies.

Finally, let’s see what are the basic elements that define an RL agent. An RL agent may include one or more of these components,

a) A policy (which is mandatory one) that determines agent’s behavior

b) A value function, and if it is there, implies policy is implicit.

c) A model which is a kind of agent’s own understanding about the environment, an agent can be a model-free or with model.

Following are typical characteristics of Reinforcement Learning:

First, there is no supervisor i.e. Reinforcement Learning Agent does not work on instructive feedback rather evaluative feedback which is determined by a scalar reward signal.

Second, there are delayed rewards, or estimated ones, for example, consider this analogy, we may have an RL based chess playing agent, now a particular move or sequence of moves, may result in a Win or Loss but not certainly just after that particular move. (In fact, that is a special case of Immediate RL or K-armed bandits)

Third, in RL, time really matters and that we will see when we will talk about different approaches in RL.

Fourth, the agent’s actions affect the subsequent states and hence subsequent experience it receives and that’s where we will see the importance of exploration & exploitation methods (also known as trial and search).

Now let’s quickly see the examples of key elements of an RL agent.

A policy determines an agent’s behavior i.e. it determines, what actions should an agent take in a particular state. E.g. here in this maze, if an agent is in the cell next to start, the agent knows, it should take a left. The agent eventually learns an optimal policy that corresponds to a map of best possible (optimal) actions corresponding to states it visits with a goal to maximize its rewards.

A value function, on the other hand, helps in determining the prediction of future reward and it is used to evaluate the goodness/value of a particular state, or an action corresponding to a state. In other words, it answers the question “How good is it to be in a particular state or how good is to take any action in a particular state. E.g. in the same maze example, since cell corresponding to the left of start, has lesser value as compared to cell right to it which is -15, Agent knows it is supposed to move from cell having -16 to -15 and eventually to the cell which has the max value which is the terminal state (near Goal) and has a value corresponding to -1.

And then, we have model, it is something like we humans have, that is our own understanding of the world. For e.g. this view of maze is an internal view of the full maze in Agent’s own perception and that’s why we don’t see all the tracks.

As I stated earlier, an agent may have a Model or maybe model-free & learns entirely on methods of exploration & exploitation.

And that makes RL a peculiar & efficient way to learn & adapt to the dynamicity of the environment.

Now, when we understood the different aspects of Reinforcement learning, let’s answer this important question, why do we even bother to learn about reinforcement learning?

Why Reinforcement Learning?

There would be a number of reasons that one may come up behind the notion of ‘Why’, Well I have the following two reasons,

First, RL agents learn by a continuous process of receiving rewards & penalties and that makes them robust to have trained and respond to unforeseen environments.

Let’s take this example, in case of Reinforcement Learning, it’s all about experiential learning, and that’s why we say, give a man a taste for fish and rest he will figure out. Unlike supervised learning, where we need to explicitly teach/supervise via data labels etc.

Second, if we look market trends especially from Gartner’s Hype Cycle for AI, 2019, Reinforcement Learning boarded the wave and even though it is still in Innovation Trigger phase, that indicates the next turf and soon that will be a reality and thus it is the best time to learn and get ready for the overwhelming future needs that shall be driven by techniques such as RL

How Reinforcement Learning?

And, here are three ways, which I found useful while getting on to reinforcement learning.

First, get a book with the title “Reinforcement Learning, an introduction” by Richard S. Sutton & Andrew G. Barto”. This is one of the best reads one can have on this topic.

Second, though there are a number of resources you may look around, but to start with you can refer to an RL Course by David Silver which is a collection of around 10 videos and deals with RL end to end and mostly deals with content from the book of Sutton & Barto. So you can even watch the videos and then go over relevant chapters from the book to affirm the understanding of topics.

Then additionally there are more resources, such as a nice conversational style RL course by Udacity and moreover you can always follow my channel (Being Cognitive).

Third, aspect is to get your hands dirty, because without hands-on, this subject will become too overwhelming and that’s where openai.com will help. There you may refer to Gym to go over examples, environments, train your RL agents, tweak the existing ones, etc. Believe me, that will be fun.

NOTE: Watch the related video and follow my channel for most updates on https://www.youtube.com/channel/UCt8BeNe9CKSaks6XhgFulNw

--

--