Intrinsically Motivated Reinforcement Learning

Individual’s behavior is a function of its consequences

Raghvi Saxena
Clique Community
5 min readSep 4, 2021

--

Photo by Pietro Jeng on Unsplash

Motivation to do something is intrinsic in nature if you derive pleasure from the activity itself and not because of any external benefits. For example, if you are reading this blog out of your own curiosity, you’re experiencing intrinsic motivation. This sense of spontaneous exploration and curiosity is not limited to us. This blog explores how this idea can also be applied to machines.

The origin of reinforcement learning

Reinforcement is anything that increases the likelihood that a response will occur. The reinforcement theory of motivation was proposed by B.F. Skinner and his associates. This theory focuses entirely on what happens to an individual when they take some action. For example, giving a bar of chocolate immediately after a child puts away their toys. By reinforcing the desired behavior, the child will be more likely to perform the same actions again in the future.

This theory’s effective usage is in behavior modifications in living beings. Now let’s consider a concept where this theory is applied to machines and they can teach themselves depending upon the results of their own actions. Sounds fascinating, doesn’t it? That’s the essence of reinforcement learning(RL)[3].

Fig-1: Example of Reinforcement Learning in Autonomous Parking

Introduction to Intrinsically motivated reinforcement learning

When we talk about rewards, we can go back to Skinner’s theory which classifies motivation to do a particular action and consecutively, the gained rewards into two categories.

Table-1: Difference between Extrinsic and Intrinsic Motivation

In reinforcement learning, we mostly use the extrinsic reward to train our agent — A tangible reward that can be calculated and attained by the agent using exploration and exploitation of various policies. What happens when we also factor in the intrinsic reward and internal sensations into the equation? How do we determine intrinsic joy for a machine?

Intrinsically motivated reinforcement learning (IMRL)[1], an extension to RL where an agent is rewarded for behaviors other than those strictly related to the task being accomplished, e.g., by exploring or playing with elements of its environment. This field is dedicated to studying the questions we put above and optimizing RL's results by open-ended learning and skill acquisition as we go.

Fig-2: Agent-Environment Interaction in RL. A: The usual view. B: An elaboration

The critic or judge is present in the internal environment of an agent. It judges the decision and then performs the action. Our main departure from the usual application of RL is that our agent maintains a knowledge base of skills that it learns using intrinsic rewards. IMRL focuses on what skills should the agent learn and how can the agent learn these skills efficiently.

The Playroom Domain: Explaining IMRL scenario

One of the most prominent and interesting research in the area illustrates its ability to allow an agent to learn broad competence in a simple “playroom” environment[1]. The objects in the playroom all have potentially interesting characteristics.

Fig-3: The playroom domain
  • The bell rings once and moves to a random adjacent square if the ball is kicked into it.
  • The light switch controls the lighting in the room.
  • The colors of any of the blocks in the room are only visible if the light is on, otherwise, they appear similarly gray.
  • If pressed, the blue block turns the music on, while the red block if pressed turns the music off.
  • Either block can be pushed and as a result, moves to a random adjacent square.
  • The toy monkey makes frightening sounds if simultaneously the room is dark, the music is on, and the bell is rung.

These objects were designed to have varying degrees of difficulty to engage.

The agent has an eye, a hand, and a visual marker. The agent’s sensors tell it what objects are under the eye, hand, and marker. In addition, if both the eye and hand are on some object, then natural operations suggested by the object become available, e.g., if both the hand and the eye are on the light switch, then the action of flicking the light switch becomes available, and if both the hand and eye are on the ball, then the action of kicking the ball becomes available. The agent has no prior knowledge of the objects present in the room or their functionalities.

You might wonder what makes it different from reinforcement learning? Usually, in any game that uses RL, the action-value pairs are pre-defined. There are preset actions (Move left/right or up/down) and the reward of doing that action is known. This is completely uncharted territory for the machine. The action-value pairs form as the machine explores the playroom.

When the agent encounters an unpredicted event (a salient event) a few times, its updated action-value function drives it to repeatedly attempt to achieve that salient event. There are two interesting effects of this

  1. As the agent tries to repeatedly achieve the salient event, learning improves both its policy for doing so and its option model that predicts the salient event.
  2. As its option policy and options model improves, the intrinsic reward diminishes and the agent gets “bored” with the associated salient event and moves on.

As a result, the option policy and model become accurate in states the agent encounters frequently. Occasionally, the agent encounters the salient event in a state that it has not encountered before, and it generates an intrinsic reward again — it is “surprised”.

Inference

Fig-4: The effect of intrinsically motivated learning when an extrinsic reward is present.

Fig-4 shows the relationship between the number of steps between extrinsic rewards and the number of extrinsic rewards. As you can infer from the graph when we use intrinsic as well as extrinsic rewards the number of steps taken to achieve the reward decreases significantly.

Conclusion

The agent explores the entire playroom and exploits every new event it encounters in its journey to acquire new skills. This self-driven version of RL has immense potential for applications in real-world scenarios. New algorithms based on the studies are emerging in various fields such as the optimal exploration problem, access skills, maze problem, etc. The study presents new horizons for improvement in technologies as well as research.

References:

[1] Singh, Satinder, Andrew G. Barto, and Nuttapong Chentanez. Intrinsically motivated reinforcement learning. MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE, 2005.

[2]Oudeyer, Pierre-Yves, and Frederic Kaplan. “What Is Intrinsic Motivation? A Typology of Computational Approaches.” Frontiers in Neurorobotics 1 (2009): 6. https://doi.org/10.3389/neuro.12.006.2007.

[3] https://www.kaggle.com/discussion/221161

--

--