INTRODUCTION to REINFORCEMENT LEARNING

4 min readDec 5, 2022

Reinforced learning problems involve learning what to do — how to do it.
We map states to actions to maximize a numerical reward signal. Inside
Importantly, they are closed-loop problems because the learning system
actions affect subsequent entries. Moreover, as with many forms of machine learning, the learner is not told what actions to take, but instead has to explore. Which actions give the most rewards by being tried.

IMPORTANT TERMS in REINFORCEMENT LEARNING

Agent: An entity that can perceive/explore the environment and act upon it.
Environment: A situation in which an agent is present or surrounded by. In Reinforcement Learning, we assume the stochastic environment, which means it is random in nature.
Action: Actions are the moves taken by an agent within the environment.
State: State is a situation returned by the environment after each action taken by the agent.
Reward: A feedback returned to the agent from the environment to evaluate the action of the agent.
Policy: Policy is a strategy applied by the agent for the next action based on the current state.
Value: It is expected long-term retuned with the discount factor and opposite to the short-term reward.
Q-Value: It is mostly similar to the value, but it takes one additional parameter as a current action.

ELEMENTS of REINFORCEMENT LEARNING

We can identify four main subelements of a reinforcement learning system:

Policy
Reward Signal
Value Function
Model of Environment (Optional)

POLICY

A policy defines the learning agent’s way of behaving at a given time. The policy may be a simple function or lookup table. The policy is the core of a reinforcement learning agent in the sense that it alone is sufficient to determine behavior.

REWARD SIGNAL

A reward signal defines the goal in a reinforcement learning problem. On each time step, the environment sends to the reinforcement learning agent a single number, a reward. The agent’s sole objective is to maximize the total reward it receives over the long run. The reward signal thus defines what are the good and bad events for the agent.

VALUE FUNCTION

A value function specifies what is good in the long run. The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Whereas rewards determine the immediate, intrinsic desirability of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states.

For example, a state might always yield a low immediate reward but still have a high value because it is regularly followed by other states that yield high rewards. Or the reverse could be true. Rewards are basically given directly by the environment, but values must be estimated and re-estimated from the sequences of observations an agent makes over its entire lifetime. In fact, the most important component of almost all reinforcement learning algorithms we consider is a method for efficiently estimating values. The central role of value estimation is arguably the most important thing we have learned about reinforcement learning over the last few decades.

MODEL of ENVIRONMENT

A model of the environment, is something that mimics the behavior of
the environment, or more generally, that allows inferences to be made about how the environment will behave.

For example, given a state and action, the model might predict the resultant next state and next reward. Models are used for planning, by which we mean any way of deciding on a course of action by considering possible future situations before they are actually experienced. Methods for solving reinforcement learning problems that use models and planning are called model-based methods, as opposed to simpler modelfree methods that are explicitly trial-and-error learners — viewed as almost the opposite of planning.

APPROACHES of REINFORCEMENT LEARNING

There are mainly three ways to implement Reinforcement Learning:

VALUE-BASED

The value-based approach is about to find the optimal value function, which is the maximum value at a state under any policy. Therefore, the agent expects the long-term return at any state(s) under policy.

2. POLICY-BASED

Policy-based approach is to find the optimal policy for the maximum future rewards without using the value function. In this approach, the agent tries to apply such a policy that the action performed in each step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:

Deterministic: The same action is produced by the policy at any state.
Stochastic: In this policy, probability determines the produced action.

3. MODEL-BASED

In the model-based approach, a virtual model is created for the environment, and the agent explores that environment to learn it. There is no particular solution or algorithm for this approach because the model representation is different for each environment.

INTRODUCTION to REINFORCEMENT LEARNING

IMPORTANT TERMS in REINFORCEMENT LEARNING

ELEMENTS of REINFORCEMENT LEARNING

APPROACHES of REINFORCEMENT LEARNING

Written by Hilal Müleyke YÜKSEL