Reinforcement Learning (RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. Well, we all are aware of the most faithful friend which we humans have i.e. Dogs. We can learn a lot from them while playing with them, but what if I tell you that Dogs are a perfect example of Reinforcement Learning. Don’t believe me, then let’s get ready to learn about what actually Reinforcement Learning is and how dogs showcase it.

In this introduction to Reinforcement Learning, we’ll walk through

  1. What is Reinforcement learning in simple words?
  2. The components of Reinforcement Learning problem.
  3. Distinguishing between Reinforcement learning, Supervised and Unsupervised learning.
  4. Algorithms used for implementing RL.
  5. Practical implementation of Reinforcement learning.
  6. Ways used for learning.
  7. The disadvantage of Reinforcement Learning.
  8. Applications of Reinforcement Learning around us.
  9. Real world implementation of Reinforcement Learning.

Reinforcement Learning in Simple Words

Reinforcement Learning is learning the best actions on the basis of rewards and punishment. But when we wear our technical goggles, then Reinforcement Learning is defined using three basic concepts i.e. states, actions, and rewards.

Here the “state” defines a situation in which an agent is present who performs some “actions” and based upon these actions the agent receives either rewards or punishment.

When we consider the example of the dog, there we have the owner of the dog and the “dog” (Agent) itself. Now when the owner of the dog is present in the garden with the dog, he/she throws away a ball. This throwing away of the ball is the “state” for the agent and now the dog will run after the ball which will be the “action”.

The result will be an appreciation or food for the dog from the owner which will be “reward” as a result of the action and if the dog does not go after the ball another alternate action then it may get some “punishment”. Therefore, this is what Reinforcement Learning is all about. Next, we’ll understand the terminology which Reinforcement learning comprises of.

Components of Reinforcement Learning Problem

Now for each and every Reinforcement Learning problem, there are some predefined components which help in better representation and understanding of the problem. The following are the components:-

Agent: Agent takes actions; as mentioned earlier in our example, the dog is the agent

Action (A): The agent has set of actions A from which it selects which action to perform. Just like the dog who decided whether to go after the ball, just look at the ball or jump at the position.

Discount Factor: The discount factor is multiplied with the future rewards as discovered by the agent to reduce the effect of the agent’s choice of action. To simplify this, through discount factor we are making the future rewards less valuable than immediate rewards. This makes the agent look at short-term goals itself. So lesser the value of discount factor the more insignificant future rewards will become and vice versa.

Environment: It is the surroundings of the agent in which it moves. In the dog example, the environment consists of the owner and the garden in which the dog is present. It is the environment which gives the agent its rewards as an output based upon the agent’s current state and action as inputs.

State: A state is an immediate situation in which the agents finds itself in relation to other important things in the surroundings like tools, obstacles, enemies and prizes/rewards. Here the dog is required to

Reward(R): The reward is the output which is received by the agent in response to the actions of the agent. For example, the dog receives dog food as a reward if the dog (agent) brings back the ball otherwise it receives scolding as a punishment if it does not wish to do so.

Policy: Here policy is the strategy which agent uses to determine the actions which should be taken on the basis of the current state. Basically the agent’s maps states to actions i.e. it decides the actions which are providing the maximum rewards with regards to states. Talking about the dog example, when the dog comes to know that dog food will be given as a reward if it brings back the ball, keeping this in mind the dog will create its own policy to reap maximum rewards.

Markov Decision Processes (MDP’s) are mathematical frameworks to describe an environment in reinforcement learning and almost all RL problems can formalize using MDP’s.

Basically, MDP’s consist of a set of finite environment states S, a set of possible actions A(s) in each state, a real-valued reward function R(s) and a transition model as well.

Algorithms used for Implementing RL

Reinforcement learning along with its fundamental concepts needs to be implemented practically and for that, we use the following algorithms. Let’s have a look at those algorithms:

Q-Learning: Q learning is the most used reinforcement learning algorithm. By the usage of this algorithm, the agent learns the quality (Q value) of each action (i.e. policy) based on how much reward the environment returns with.

Q Learning uses the table to store the value of each environment’s state along with the Q value.

SARSA (State-Action-Reward-State-Action): SARSA resembles Q-learning to a lot extent. The only difference between the two is that SARSA learns the Q-value based on the action performed by the current policy as compared to Q-learning’s way of using greedy policy.

Click here and reach out to the full article on Reinforcement learning.

Do you share the same enthusiasm for Data Science, ML, Deep Learning and collaborative learning!! Go ahead and fill in your details here and we will add you as a writer on our Medium publication and StepUp Analytics. Happy writing!

And of course — don’t forget to spread the word around about our publication!.

Scale Up Your Skills with StepUp Analytics.

“Keep Learning, Keep Practicing”

--

--

StepUp Analytics

StepUp Analytics is a Community of Creative, Highly Energetic Data Science and Analytics Professionals and Data Enthusiast.