Bridging Cognitive Science and Reinforcement Learning Part 1: Enactivism

In the Cognitive Sciences there are a number of competing theoretical frameworks which attempt to explain intelligent behavior. One perspective which I have always been particularly fond of is the embodiment/enactive approach, which a diverse array of individuals from French Phenomenologists to modern day neuroscientists claim to follow. What ties them all together is the belief that the body and environment plays a central role in perception, action, and the generation of meaning for an organism, not just the brain. While this idea has gained a fair amount of traction within Cognitive Science and Neuroscience, we don’t hear it talked about much in the context of AI. I’d like to change that.

I recently finished reading Action in Perception by Alva Noë, one of the advocates of the Enactivist approach. According to Noë, action is fundamentally for perception. We as intelligent beings wouldn’t be able to truly have a perceptual experience if not for the skillful use of our sensorimotor capacities. In this way, through action in an environment, we enact the perceptual world. These skills depend critically on our having a body, and as such this is a theory of the importance of embodiment as well. Reading his book while working on the reinforcement learning algorithms I’ve shared recently, I couldn’t help but think of some of the ways the two disciplines could inform one another.
In the context of AI, bodies and environments are far from being given things. For every robot, or self driving car, there is a disembodied Siri, or any other chatbot that we may encounter on a daily basis. More interestingly we have AlphaGo, which is able to play the game of Go at the level of the world champion despite having no physical body (unless you consider a server farm a body — I don’t for reasons I will save for another post). In this article I want to explain that while AlphaGo and other similar agents may not have a literal physical body, they have a kind of virtual body that exists by virtue of the reinforcement learning environment. It is this virtual embodiment that allows for AIs to learn skills in a way similar to that of living organisms.
Perception… then Action?
There is traditionally a tendency to think of the different parts of the brain as accomplishing separate tasks. The visual system is for visual perception, and higher-level parts of the brain are for action and planning, so the dogma goes. This is referred to as the modularity argument, and it often is promoted in the popular media. Whenever there is a study finding the “area of the brain responsible for x, ” there is an assumption that that area exists independently from the rest of the brain, which is responsible for other functions. If we were to think of this in terms of neural network architecture, it would be akin to building and training one network for vision, and then connecting its output to a network for decision making.

When we examine how things actually are however, this distinction between perception and action blurs. The reasons are simple. For one thing: vision by itself has no guiding purpose. Why see things any one specific way, when any other way is just as good? Furthermore, the brain, like so many things in the world is densely interconnected. The idea of a brain region in isolation is an artificial construct.
Perception is for Action
The two-step neural network outlined above is never used in the real world, because it simply doesn’t work very well. Instead, we have deeply connected Convolutional Neural Networks where the decision and perception are trained as part of the same network. With enough training, these networks are able to quite successfully identify the objects within a scene when presented with an image of that scene. In this way the network has a specific purpose, and it’s vision is harnessed toward a goal. These networks learn to be sensitive to aspects of the visual scene that are relevant to the task at hand. They learn sensitivity to outlines and luminance at the lower layers, and things like faces and shapes at the higher layers. Outlines and faces become not merely neutral features of the scene to the network, but rather meaningful in relation to their potential for signifying a given object.

Only organisms that need to identify objects care about the outlines of objects, or the meaningful markers that might distinguish one object from another. In a neural network, the process by which this happens is backpropogation. For those unfamiliar, backpropogation uses a loss function defined at the highest layer of the network to determine how “wrong” the network was about a given sample and sends a signal backwards through the network to update the connections in order to be a little more “right” next time. It is through the training process of backpropogation that the objective defined at the final layer of the network is able to effect all previous layers. Backpropagation is a powerful idea, and at least one of the leading researchers in Deep Learning believe that it is what drives learning in the brain too!

Turning to reinforcement learning agents we discover the same logic, but even more strikingly similar to Noë’s argument. For now we have an environment that the agent is always embedded within. Furthermore, the agent is now explicitly using its perceptions for action in that environment. AlphaGo, for example is able to learn to see the Go board not just in any way, but in a way that is directly conducive to acting in the game. In this way the agent learns to make sense of the world with a particular engagement within it. It learns not some neutral representation of the world that is then acted upon, but rather a world that is from the beginning filled with meaning. This way of thinking about human experience has a history in the phenomenological tradition of philosophy. Chief among the phenomenologists was Maurice Merleau-Ponty, who almost a century ago wrote:
For the player in action the football field is not an “object,” that is, the ideal term which can give rise to an indefinite multiplicity of perspectival views and remain equivalent under its apparent transformations. It is pervaded with lines of force (the “yard lines”; those which demarcate the “penalty area”) and articulated in sectors (for example, the “openings” between the adversaries) which call for a certain mode of action and which initiate and guide the action as if the player were unaware of it. The field itself is not given to him, but present as the immanent term of his practical intentions; the player becomes one with it and feels the direction of the “goal,” for example, just as immediately as the vertical and the horizontal planes of his own body. It would not be sufficient to say that consciousness inhabits this milieu. At this moment consciousness is nothing other than the dialectic of milieu and action. Each maneuver undertaken by the player modifies the character of the field and establishes in it new lines of force in which the action in turn unfolds and is accomplished, again altering the phenomenal field. — Maurice Merleau-Ponty (Structure of Behavior 1963)
Merleau-Ponty points out the way in which a soccer game is not experienced in a so-called “objective” world. To a skilled soccer player, every perception is from the beginning a field of meaning with points of attraction and repulsion. Moreover this field is fundamentally geared toward action. Reinforcement learning allows for this possibility, and it does so because the agent is embodied. The world is at-stake for the agent, since each action is given a meaning through the rewards received by the agent. Below is an image taken from a paper by Wang et al., in which they show what the agent “sees” in the Atari game Enduro. The red area on the right image indicates the network seeing the area around the car in front of it as meaningful for action. The agent must avoid the car to maintain a high score, and as such the car becomes a point of repulsion for the agent.

Action is for Perception
Alva Noë goes much further in his book however than simply pointing out that perception is for action. He makes the more radical claim that action is for perception. What he means by this is that without action there could be no true meaningful experiences of the world. The book sets out exploring just how it is that action makes perception possible. Noë points out that our retinal images provides us with very little at any given moment, and what they do provide doesn’t correspond to how we experience the world. A glance at a bowl creates an elliptical impression on our retina. How does this elliptical impression become the experience of a circular bowl? In order to answer this question, Noë draws on the concept of sensorimotor skills, which allows us to understand the way in which the phenomenal world changes in respect to us and our environment. It is because we can move around the bowl, and watch as it moves around the world that we are able to understand it is circular. By building up sensori-motor skills, we acquire the capacity to give meaning to a world otherwise meaningless.

This can be taken to a more abstracted level when thinking about complex behavior in the world. We act in order to give ourselves better perceptions, which improve our understanding of the world, thus increasing our capacity to act. Imagine that you want to find a lost dog. You suspect it is behind a car you see in the distance. We have the knowledge that by moving around the car, we are able to put ourselves in a position to obtain new kinds of experiences (namely that of the other side of the car, and what it had occluded). This knowledge of how to skillfully discover new meaning within an environment is a key aspect of human intelligence. When we look at modern AI however, we find it somewhat lacking.

Current Reinforcement Learning approaches are often limited by short sighted agents. If they aren’t able to see how to achieve a reward given a certain context, they have little means of putting themselves in a better context to see what they need. They have the capacity to fundamentally see action in perception, but they lack the capacity to meaningfully utilize action in order to uncover new kinds of perceptions. The Reinforcement Learning community certainly isn’t unaware of this deficit. In fact it is an active area of research, even if they don’t call it “Action In Perception.”
Current Research Along These Lines
The fundamental problem of getting an RL agent to care about experiences that aren’t directly rewarding, but rather open the possibility of greater rewards has been tackled in a number of ways. The first of which is to discover ways to reward an agent for exploring its environment. The thinking along these lines is that by encouraging exploration, the agent will stumble onto novel kinds of environments that allow for greater rewards. Research into this approach has attempted to develop different kinds of bonuses for exploring.The size of this bonus is typically thought of as the level of surprise an agent experiences at finding itself in a novel situation.
Given this general definition, there are a number of ways the surprise factor could be calculated. The way used by one group is to determine the difference between what an agent expected to receive as a reward for it’s action, and the true reward. By exploiting what they call Prioritized Replay, an agent trains itself more frequently on experiences that yielded unexpected rewards. Another way to gauge surprise is to look at the difference not in what the agent expects as a reward, but in what the agent expects to see next. Research along these lines utilize a model of the environment, and the difference between the expectation of the model and the true environment is used as the surprise bonus.
Another approach is Hierarchical Planning. The idea here is to develop agents that can learn to decompose tasks into smaller subtasks. By rewarding an agent for completing subtasks that are necessary, but not inherently rewarding themselves, the designer of the agent is able to mold the agent’s behavior toward acting in ways to discover perceptions that allow for new kinds of actions to be taken. Researchers have built an agent using this approach that has been able to perform well on the Atari game Montezuma’s Revenge. This game is difficult for traditional RL agents because the agent must find a key and use it to unlock a door in each room in order to receive a reward. This may sound simple to us, but it involves performing a series of intentional behaviors before it ever receives a reward.

While all of these areas are promising approaches, there seems to not yet have been an “ah-ha” moment for the community to rally behind. It is still unclear how to allow action to influence perception in a natural and generalizable way that fits neatly into neural architecture the same intuitive way that the perception -> action relationship does. All these lines of research are in their infancy however, and there is little doubt that this kind of solution will be discovered one day. This capacity is after all so essential to what allows organisms to survive in the world, and it will be essential to true AI in the future.
I hope you’ve enjoyed what is the first of a new series of discussions. In the future I plan to write more about the role of philosophical and psychological concepts like embodiment, phenomenology, and desire in AI and Deep Learning. My hope is that by examining the two fields in light of one another, new approaches to AI can be developed to push the field forward. If you’d like to keep up with the series, follow me here on Medium (Arthur Juliani), or on Twitter at (https://twitter.com/awjuliani).