REINFORCEMENT LEARNING

5 min readJul 16, 2023

What is reinforcement learning?

Reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones. In general, a reinforcement learning agent is able to perceive and interpret its environment, take actions and learn through trial and error.

What is reinforcement learning?

How does reinforcement learning work?

In reinforcement learning, developers devise a method of rewarding desired behaviors and punishing negative behaviors. This method assigns positive values to the desired actions to encourage the agent and negative values to undesired behaviors. This programs the agent to seek long-term and maximum overall reward to achieve an optimal solution.

These long-term goals help prevent the agent from stalling on lesser goals. With time, the agent learns to avoid the negative and seek the positive. This learning method has been adopted in artificial intelligence (AI) as a way of directing unsupervised machine learning through rewards and penalties.

Example:

The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed to find the best possible path to reach the reward. The following problem explains the problem more easily.

The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path which gives him the reward with the least hurdles. Each right step will give the robot a reward and each wrong step will subtract the reward of the robot. The total reward will be calculated when it reaches the final reward that is the diamond.

Main points in Reinforcement learning –

Input: The input should be an initial state from which the model will start
Output: There are many possible outputs as there are a variety of solutions to a particular problem
Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output.
The model keeps continues to learn.
The best solution is decided based on the maximum reward

Types of Reinforcement:

There are two types of Reinforcement:

Positive: Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases the strength and the frequency of the behavior. In other words, it has a positive effect on behavior.

Advantages of reinforcement learning are

Maximizes Performance
Sustain Change for a long period of time
Too much Reinforcement can lead to an overload of states which can diminish the results

2. Negative: Negative Reinforcement is defined as strengthening of behavior because a negative condition is stopped or avoided.

Advantages of reinforcement learning:

Increases Behavior
Provide defiance to a minimum standard of performance
It Only provides enough to meet up the minimum behavior

Reinforcement learning elements are as follows:

Policy
Reward function
Value function
Model of the environment

Policy: Policy defines the learning agent behavior for given time period. It is a mapping from perceived states of the environment to actions to be taken when in those states.

Reward function: Reward function is used to define a goal in a reinforcement learning problem.A reward function is a function that provides a numerical score based on the state of the environment

Value function: Value functions specify what is good in the long run. The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state.

Model of the environment: Models are used for planning.

Various Practical Applications of Reinforcement Learning –

RL can be used in robotics for industrial automation.
RL can be used in machine learning and data processing
RL can be used to create training systems that provide custom instruction and materials according to the requirement of students.

Application of Reinforcement Learnings

1. Robotics: Robots with pre-programmed behavior are useful in structured environments, such as the assembly line of an automobile manufacturing plant, where the task is repetitive in nature.

2. A master chess player makes a move. The choice is informed both by planning, anticipating possible replies and counter replies.

3. An adaptive controller adjusts parameters of a petroleum refinery’s operation in real time.

RL can be used in large environments in the following situations:

A model of the environment is known, but an analytic solution is not available;
Only a simulation model of the environment is given (the subject of simulation-based optimization)
The only way to collect information about the environment is to interact with it.

Advantages and Disadvantages of Reinforcement Learning

Advantages of Reinforcement learning

1. Reinforcement learning can be used to solve very complex problems that cannot be solved by conventional techniques.

2. The model can correct the errors that occurred during the training process.

3. In RL, training data is obtained via the direct interaction of the agent with the environment

4. Reinforcement learning can handle environments that are non-deterministic, meaning that the outcomes of actions are not always predictable. This is useful in real-world applications where the environment may change over time or is uncertain.

5. Reinforcement learning can be used to solve a wide range of problems, including those that involve decision making, control, and optimization.

6. Reinforcement learning is a flexible approach that can be combined with other machine learning techniques, such as deep learning, to improve performance.

Disadvantages of Reinforcement learning

1. Reinforcement learning is not preferable to use for solving simple problems.

2. Reinforcement learning needs a lot of data and a lot of computation

3. Reinforcement learning is highly dependent on the quality of the reward function. If the reward function is poorly designed, the agent may not learn the desired behavior.

4. Reinforcement learning can be difficult to debug and interpret. It is not always clear why the agent is behaving in a certain way, which can make it difficult to diagnose and fix problems.

import gym

import numpy as np

# Define the Q-table and learning rate

q_table = np.zeros((state_size, action_size))

alpha = 0.8

gamma = 0.95

# Train the Q-Learning algorithm

for episode in range(num_episodes):

state = env.reset()

done = False

while not done:

# Choose an action

action = np.argmax(

q_table[state, :] + np.random.randn(1, action_size) * (1. / (episode + 1)))

# Take the action and observe the new state and reward

next_state, reward, done, _ = env.step(action)

# Update the Q-table

q_table[state, action] = (1 - alpha) * q_table[state, action] + \

alpha * (reward + gamma * np.max(q_table[next_state, :]))

state = next_state

# Test the trained Q-Learning algorithm

state = env.reset()

done = False

while not done:

# Choose an action

action = np.argmax(q_table[state, :])

# Take the action

state, reward, done, _ = env.step(action)

env.render()

REINFORCEMENT LEARNING

What is reinforcement learning?

What is reinforcement learning?

How does reinforcement learning work?

Reinforcement learning elements are as follows:

Advantages and Disadvantages of Reinforcement Learning

Written by Faruk Hussain Abdullahi