Intrinsic Motivation and Open Ended Learning

My learnings from the 3rd IMOL (Intrinsic Motivation and Open Ended Learning) Workshop in Rome 4–6 Oct. 2017.

Last week I attended an impeccable event in Rome where philosophers, biologists, psychologists, physicists interested in artificial life and computer scientists discussed barriers and current state of the art in these topics. Below are some leanings summarizing my view and the program.

Despite AI advancing faster than ever, there are papers still today at NIPS being rejected if not comparing on Atari game baselines, but instead, use real robots. These are criticized for using “non-reproducible” data. One round table aimed at designing an open-ended learning benchmark to avoid this kind of problems. An analogous problem is the opposite side, where CoRL papers were rejected for not using datasets from ‘simulated’ environments. It can be more reproducible, sure, but as a second generalized note in the workshop I learned, if you do research with robots, your efforts may work way better and faster in real-life than with simulated environments. Therefore, where is the middle point and how can we build the missing datasets? A first approach discussed design issues where for instance, tiding up a kitchen may be an open-ended enough environment to validate our algorithms against.

After 3 days of full-attention-drawing quality content, we realized in the very last session that researchers with different background have different subtleties on what they understand as intrinsic vs extrinsic reward. Some think extrinsic reward is everything that allows agents to live: for humans can be food or money. While intrinsic can be everything else that makes us curious, learn, and explore and pursue either longer term or simple exploration goals. Other branch of researchers thinks there is no such difference between intrinsic and extrinsic rewards, as they are all required for humans to live. Where do you set your boundaries? Now I understand why many researchers come from the branch where NIPS/ICML is not their main conference, but rather ICDL-EpiRob or artificial life. And I wonder why the intersection among these two machine learning and developmental conferences seems quite small so far when they seem so crucial to be complementary.

Other learnings and pointers I take home:

  • Action and perception can be studied differently, but due to its intrinsic relationships, one cannot work without the other. Their joint learning spaces is considered, for instance, in Biehl17 works.
  • Other methodologies around information theory as well consider spatiotemporal patterns for agent representation in dynamical sytems, from an agent-centric perspective. This relates to the concept of empowerment, which focuses the learning not only on relevant changes of states in the environment, but changes that are affected by what the robot’s actions allow. Formally, empowerment is a universal agent-centric measure of control.
  • The Human Brain Project’s Neurorobotics Platform provides a simulation and experiment online platform to facilitate experiments:
  • From unknown sensors and actuators to actions grounded in sensorimotor perceptions and Generation of Tactile Maps for Artificial Skin shows that it is possible to learn representations of non-trivial tactile surfaces which require topologically and geometrically involved 3D embeddings with a somatotopic map (using only intrinsic properties of the tactile data with high-quality reconstruction via the proposed ANISOMAP (MDS) algorithm). A sensoritopic map can be created using a sensory reconstruction method. A sensoritopic map reflects the informational geometry of the sensors, where sensors that are highly correlated are close in the map. When optical flow can be computed, the average effect of different actuators settings can be learned. This enables the robot, by motor babbling, to build a library of sensorimotor laws which specify how its possible actions affect its sensors. Using these laws, the robot can see motion flows in its visual field and then perform a movement which will have a similar effect. (…) This can be used for basic imitation or motion tracking, where the underlying principle is to guide development by informational means”[6]. A more recent paper also focused on information theory, for which they will regret not accepting it at NIPS, is Deep Learning and the Information Bottleneck Principle.
  • There are many ways to measure learning progress associated to intrinsic motivation in the sense of what artificial curiosity means. Normally, the concept of curiosity in AI is related to the idea of making predictions about the future state and explore more in such areas where the agent’s prediction of the state was wrong, i.e., the surprise was larger. For instance, the work of [9] expands on the Intrinsic Curiosity Module of [7], providing more stable variance among predictions.
  • Similar works to our recent paper on state representation learning [10] can be looking at a task-centric point of view as in few-shot classification via task representation and communication. Araya also uses predictive models for learning dynamics:

At last, some Robotic Cognitive architectures presented for which we need to work on having a standard benchmark for unification:

  • CORBYS (Cognitive Control Framework for Robotic Systems)
  • DREAM Project Architecture[2]
  • GRAIL: Goal-generating robotic architecture for intrinsic motivation learning [5]
  • IMGEP (Intrinsically motivated goal exploration processes): Unsupervised multi-goal reinforcement learning formal framework [10].
  • STeLLA: A Scheme for a Learning Machine. J.H. Andreae (1963).
  • PURR-PUSS: A new mechanism for a brain. PURR-PUSS was taught to behave like a universal Turing machine. JH Andreae — ‎1976
  • ERA — Epigenetic Robotic Architecture, is a hybrid cognitive architecture dynamically generating spreading activation models (IA and IAC) not dissimilar to those hard wired in early Connectionism. ERA models continuously learn attempting to predict multimodal and sensorimotor contingencies. ERA makes use of the SOM and ESN modules within Aquila and dynamically grows as new streams of input arrive at its incoming YARP port. Part of Aquila 2.0: Software Architecture for Cognitive Robotics.
  • A survey with software robotic simulation tools is here (Thanks Timothee!)


[1] Implementation of dynamical systems, function approximators, dynamical movement primitives, and black-box optimization with evolution strategies such as DMP (Dynamical Movement Primitives).

[2] Duro, R. J., Becerra, J. A., Monroy, J. “Design of a Long Term Memory Structure for the DREAM Project Cognitive Architecture” IMOL17

[3] Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005). Empowerment: A Universal Agent-Centric Measure of Control. In Proc. CEC 2005. IEEE. (PDF, 121 kB)

[4] Intrinsic motivations and open-ended development in animals, humans, and robots: an overview. Baldassare et a. 2014.

[5] GRAIL: A Goal-Discovering Robotic Architecture for Intrinsically-Motivated Learning, Santucci et al , 2016

[6] From Unknown Sensors and Actuators to Actions Grounded in Sensorimotor Perceptions, Olsson et al.


[8] Unsupervised state representation learning with robotic priors: a robustness benchmark Timothée Lesort, Mathieu Seurin, Xinrui Li, Natalia Díaz Rodríguez, David Filliat. ENSTA ParisTech, France. ArXiv:

[9] Magrans de Abril, I., Kanai, R. “Intrinsically-motivated reinforcement learning for control with continuous actions”

[10] Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning, Sébastien Forestier, Yoan Mollard, Pierre-Yves Oudeyer, ArXiv 2017.