Reinforcement Learning Simplified

In simple terms — Reinforcement Learning is learning from experience

Just like humans, machines can also learn from its interaction with the environment; Reinforcement Learning is how they can do it. It is the branch of Machine Learning in which the learner is not trained(like other Machine Learning domains) rather, supposed to learn from its experience by interacting with the environment. The interaction includes taking actions through trial-and-error search, and getting feedback( positive or negative) from the environment. It has the following elements:

  1. Agent: It learns and makes decision by interacting with its environment.
  2. Environment: Everything that is outside of agent and cannot be directly controlled by the agent is known as the environment. It responds to agent’s action by giving feedback and presents new state to the agent.
  3. Reward function: It defines the reward of the agent depending on its action. It tells the agent what kind of reward it will get if it takes a particular action.
  4. Policy: The behavior of the agent is defined by the policy. It tells the agent what actions to take and what actions to avoid to achieve its goal.
  5. Value function: It evaluates the action of the agent taken in a particular state considering futu re rewards. It give the agent information about the long term consequences its actions.
  6. Model of the environment(optional): It is the representation of the environment based on which it gives feedback and presents new state to the agent.

I will illustrate the idea behind each element through a popular childhood game of tic-tac-toe.

tic-tac-toe game

Tic-Tac-Toe is a 3x3 board game of two players and the players who successfully place Os or Xs in three consecutive places either horizontally, vertically or diagonally wins the game. The game is draw otherwise. The above figure shows Xs in three consecutive places diagonally.

Now consider two players — player A and player B are playing against each other; Player A is is an imperfect player(who is semi-skilled and can make mistakes at times) and Player B is the one who can learn from experience.In this case the elements are:

Agent: Player B because it can learns and makes decisions based on its interaction with the environment.

Environment: everything(including Player A) is the environment as it gives feedback and presents new states to Player B.

Reward signal: Goal of the player B; In this case to win the game

Policy: What move to make when going from one state to another?

Value function: What moves are good or bad for Player B in the long term?

Model of the environment: representation of the environment which is used to give reward to player B

Now that we have an overview of the elements of reinforcement learning. Let me explain about the interaction between them.

Agent-Environment Ineraction

At each time step t, the environment sends some information about agent’s state s<t>;In above example, s<t> is column/row of the board. The agent then takes an action a<t> depending on the s<t>. In the case of tic-tac-toe game, a<t> would be the move Player B makes after knowing about its state. As a consequence of agent’s action, the environment then sends a numerical reward r<t+1> at time step t+1. This interaction continues until the agent achieves its goal.


  1. An introduction to Reinforcement Learning, Sutto and Barto
  2. David Silver Course on Reinforcement Learning

PS: This is my first online post. I wrote it based on my understanding of Reinforcement Learning. Any suggestion/improvement about the content and/or style of writing will be appreciated.