Introduction to reinforcement learning

Published in

Analytics Vidhya

5 min readDec 25, 2018

This is the first part in the series of reinforcement learning tutorials. I am a student and learning reinforcement learning. There is no great source for learning reinforcement learning if you compare from other artificial intelligence paradigms such as deep learning. This will be a series of tutorials and majorly copied from other places and will be a good source if you are starting. Enough talking let's jump into some real stuff.

What is reinforcement learning?

Reinforcement learning (RL) is a branch of machine learning where the learning occurs via interacting with an environment. It is goal-oriented learning where the learner is not taught what actions to take; instead, the learner learns from the consequence of its actions. Consider an example of a robot which can move in two directions left or right. There is an object to its left and not at the right. In reinforcement learning, we give rewards to the agent according to the action it takes. In this case, when the robot hit the object we give it a negative reward and if it doesn’t hit we give a positive reward. Reinforcement learning is basically a trial and error learning process.

How reinforcement learning differ from other machine learning algorithms?

In supervised learning, the machine (agent) learns from training data which has a labelled set of input and output. There is an external supervisor who has a complete knowledge base of the environment and supervises the agent to complete a task.
In unsupervised learning, we provide the model with training data which only has a set of inputs; the model learns to determine the hidden pattern in the input. There is a common misunderstanding that RL is a kind of unsupervised learning but it is not. In unsupervised learning, the model learns the hidden structure whereas in RL the model learns by maximizing the rewards. Say we want to suggest new movies to the user. Unsupervised learning analyses the similar movies the person has viewed and suggests movies, whereas RL constantly receives feedback from the user, understands his movie preferences, and builds a knowledge base on top of it and suggests a new movie.

Reinforcement learning agent

A reinforcement learning agents are the software programs that make intelligent decisions and they are basically learners in RL. Agents take action by interacting with the environment and they receive rewards based on their actions, for example, Super Mario navigating in a video game.

Elements of RL

Policy:- A policy defines the learning agent’s way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be taken when in those states. The policy is the core of a reinforcement learning agent in the sense that it alone is sufficient to determine behavior. A policy is often denoted by the symbol 𝛑.
Reward Function:- A reward function defines the goal in a reinforcement learning problem. Roughly speaking, it maps each perceived state (or state-action pair) of the environment to a single number, a reward, indicating the intrinsic desirability of that state. A reinforcement learning agent’s sole objective is to maximize the total reward it receives in the long run. The reward function defines what are the good and bad events for the agent.
Value Function:- The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Whereas rewards determine the immediate, intrinsic desirability of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states. To make a human analogy, rewards are somewhat like pleasure (if high) and pain (if low), whereas values correspond to a more reﬁned and farsighted judgment of how pleased or displeased we are that our environment is in a particular state.
Model:- Model is the agent’s representation of an environment. The learning can be of two types — model-based learning and model-free learning. In model-based learning, the agent exploits previously learned information to accomplish a task whereas, in model-free learning, the agent simply relies on a trial-and-error experience for performing the right action. Say you want to reach your office from home faster. In model-based learning, you simply use a previously learned experience (map) to reach the office faster, whereas in model-free learning you will not use previous experience and will try all different routes and choose the faster one.

Reinforcement learning environment

Everything agents interact with is called an environment. The environment is the outside world. It comprises everything outside the agent. An environment can be broadly divided into two categories

Deterministic environment:- An environment is said to be deterministic when we know the outcome based on the current state. For instance, in a chess game, we know the exact outcome of moving any player.
Stochastic environment:- An environment is said to be stochastic when we cannot determine the outcome based on the current state. There will be a greater level of uncertainty. For example, we never know what number will show up when throwing a dice.