Reinforcement Learning — Beginner’s Approach Chapter -I

Published in

Analytics Vidhya

8 min readMay 10, 2020

Hola,

Reinforcement Learning is one of the trending topics in the Research industry and its popularity is growing day by day. While Deep Neural Networks have emerged as AI breakthroughs in problems like computer vision, machine translation, and time series prediction. Reinforcement learning can be combined with deep learning to create an awesome tech like AlphaGo, an algorithm that beats world champions in Go board Games.

Deep Reinforcement Learning basically uses ANN(Artificial Neural Networks) with Reinforcement learning architecture that provides Agents i.e. software or application to learn the best possible action in a virtual environment in order to achieve their goals.

There will be a much more clear explanation in the coming sections but for now, point to remember that Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation.RL Algos can be combined with Deep Neural Networks that can beat human pros playing games like DOTA, Starcraft-II as well as Go board games champions. Reinforcement learning applications are not limited to Games it’s a vast improvement over reinforcement learning’s previous accomplishments, and the state of the art is progressing rapidly.

Roadmap

Introduction to Reinforcement Learning
Terminologies related to Reinforcement learning
Types of Reinforcement Learning
Reinforcement Learning Workflow
Applications of Reinforcement Learning
RL Algorithms Model-Free vs Model-Based
References

Reinforcement learning is much more complex than Machine learning and Deep learning algorithms When I started it becomes a nightmare for me but I break it down in a much simpler way you think.

Introduction to Reinforcement learning

According to Wikipedia

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

More Simpler way Reinforcement learning is a subset of machine learning that enables an agent to learn in an interactive environment using feedback from its own actions and experiences.

Reinforcement learning is basically a training machine learning model to make a sequence of decisions. Agents are trained to achieve their goals in complex environments. The machine deploys the train and error technique to get a solution to the problem. In order to make a machine learn it is rewarded or penalized for actions it performs. The point to capture here is the overall goal of an RL Algorithm is to maximize the total reward.

Let me put a simple Illustration-

Imagine a student is given a new platform to code at your home (environment). In simple terms, the student (agent) will first observe and construct his/her own representation of the environment (state). Then the curious student will take certain actions like hitting the execute (action) and observe how would the Coding Platform responds (next state). As a boring Platform, the student dislikes it (receiving a negative reward) and will take fewer actions that will lead to such a result(updating the policy) and vice versa. The student will repeat the process until he/she finds a policy (what to do under different circumstances) that he/she is happy with (maximizing the total (discounted) rewards).

Reinforcement learning solves difficult decision-making problems by corresponding actions with delayed output. They have to wait for some time in order to get the essence of what decision they have made. A number of iterations over time have been performed to understand which action leads to which outcome.

Moreover, in the Real-world we face many complex problems hence typical RL algorithm has no idea to tackle such complicated problems. Researchers have invented out methods to solve some of the problems by using deep neural networks to model the desired policies, value functions, or even the transition models, which therefore is called Deep Reinforcement Learning. This article makes no distinction between RL and Deep RL.

There are lots of awesome about Reinforcement learning online and interested readers can refer to the Reference section~

Terminologies related to Reinforcement learning

source:http://incompleteideas.net/book/bookdraft2017nov5.pdf

Reinforcement learning consists of the essence of Agents, environments, states, actions, and rewards, all of which we’ll explain below-

# Agents- In Simple words Agents take action. The Algorithm is an agent moreover in Real-world agents, are you? A simple example is an agent that monitors the charge level of a robot’s battery and sends commands to the robot’s control architecture. This agent’s environment is the rest of the robot together with the robot’s environment.

# Action-Actions are the collection of all probable moves that agents can make. In video games, say super Mario .mario can contain a list of actions includes running right or left, jumping high or low, crouching, or standing still. In the trade markets, the action list might include buying, selling or holding any one of an array of trades and their derivatives. To obtain a lot of rewards, a reinforcement learning agent must prefer actions that it has tried in the past and found to be effective in producing reward. But to discover such actions, it has to try actions that it has not selected before

# Environment-Environment refers to which the agent interacts and which responds to the agent. The Environment inputs the agent’s current state and action return output of the agent’s reward and its next state. Reinforcement learning involves interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment.

# State- It is basically an immediate situation in which related to agents. The state can also be the current situation of the agent returned by the environment or any future situation. In Real-world, Were you ever in the wrong place at the wrong time? That’s a state.

# Reward- It is basically feedback that quantifies the action of agents in a given state. For example, in a video game, when Mario touches a coin, he wins points. On each time step, the environment sends to the reinforcement learning agent a single number called the reward. The agent’s sole objective is to maximize the total reward it receives over the long run. Rewards can be obtained suddenly or it can be delayed also based on evaluating the agent’s action.

#Policy-It defines the Agent’s behavior at a given time. In Simple words, it maps the perceived states of the environment to actions to be taken when in those states. In some cases, the policy may be a simple function or lookup table, whereas in others it may involve extensive computation such as a search process. The policy is considered to be the core of Reinforcement learning.

#Value-Definition of value is when expected long term return of current state under the policy. We lower the estimated value of rewards called the discount factor.

There are other terms also related to Reinforcement learning, Key terms are explained above. For more info Please jump to References~

Types of Reinforcement Learning

Generally speaking, there are two types of Reinforcement learning-

Positive- Positive Reinforcement is when an event occurs due to the strength and frequency of the event’s behavior. Simply it is a positive condition on behavior. Positive RL basically maximizes the performance and preserve any change for a long period of time.
Negative-Negative Reinforcement is achieved by strengthening of a behavior. It provides resistance to minimize the standard of performance moreover it creates a jump in the value of behavior.

Reinforcement Learning Workflow

It’s necessary to breakdown everything before learning. So Reinforcement learning workflow follows below

Step I Environment Creation

The first step to define an environment for Reinforcement learning.The Environment act as a playground for an agent. Created Environment either can be virtual or physical. Virtual Environment usually a good first step since they are secure and allow multiple experiments

Step II Define Reward

As discussed above rewards are feedback that quantifies the action of agents in a given state. Multiple iterations required in order to get the correct reward for the action performed in RL Environment

Step III Agent Creation

Agents are basically consolidating training algorithms and policy. The policy can be chosen by using Neural Networks or lookup tables. About training algorithms, it is preferred to use neural networks because they are good candidates for large state/action spaces and complex problems.

Step IV Training & Performing Validation(Agent)

Unlike we train our Neural Network and define some parameters like Early Stopping, Reduce on Plateau. In RL we set up stopping criteria and train our agents to tune the policy. Training can take anywhere from minutes to days depending on the application. Do check on validation after performing Training.

Step V Policy Deployment

The final step is to deploy the trained policy. The policy is, however, is considered to be a standalone decision-making system but if your training process does not converge your policy within a given interval of time you have to update the following parameters for performing retrain -

Training settings
Learning algorithm configuration
Policy representation
Reward signal definition
Action and observation signals
Environment dynamics

Applications of Reinforcement Learning

Reinforcement learning can be applied to the following areas in Real World-

Traffic Light Control
Robotics
Web System Configuration
Chemistry
Personalized Recommendation
Advertisement Industry
Games

RL Algorithms Model-Free vs Model-Based

Model-Free vs Model-Based RL

Model-Based RL Algorithm incorporates the experience to build the internal model of transformation and quick outcomes of the environment. Suitable action is selected by tuning the universal model.

On the other hand, Model-free RL Algorithm uses the experience to learn directly from the state, action, or policies that are achievable using the same behavior but in the absence of a universal model. These Algorithms are statistically less efficient than model-based methods because the information is consolidated with errors of state values rather used directly.

Generally, a Model-free learning agent is dependent on train and error methodology for setting up an optimal policy, and in Model-based learning, the model attempts to choose an optimal policy based on its learned Model.

I know that’s too much Information to grasp!!. Not to worry will explore Reinforcement Learning Algorithms and its implementation in the Next Chapter of this Blog Series.

References

If you like this Post, Please follow me. If you have noticed any mistakes in the way of thinking, formulas, animations, or code, please let me know.

Cheers!

Reinforcement Learning — Beginner’s Approach Chapter -I

Written by Shashwat Tiwari