Empirical Enactivism: How our brains simulate the world in order to experiment on the future

A single cognitive strategy might help brains do everything from resolving visual ambiguity, empathizing with others, and performing strategic planning

Jeremy Gordon
The Spike

--

“the brains of each one of us does literally create his or her own world” — J Z Young (1951)

I want to take the view of generative predictive models illustrated in a previous post and add a new perspective that I find unreasonably compelling: cognition as simulation. This idea is tightly linked to several models of cognition including predictive coding, embodied cognition, and enactivism, among others. For simplicity, I’ll refer to the ideas presented here simply as enactivism. My goal is to visually present the most compelling aspects of this family of ideas, and then speculate (likely with excessive optimism) on just how far it can take us in explaining many aspects of what our brains do.

To start, here’s the key idea: our brains build complex models of the world around us capable of producing accurate simulations of both unperceived aspects of the present, and not yet perceived aspects of (multiple) potential futures. I want to argue that these simulations are not one tool in the cognitive shed that can be pulled out whenever complex strategic problems surface, but rather operate continuously and directly compose cognitive tasks from the lowest subconscious levels such as scanning a visual scene, to the highest forms of social awareness. Understanding how these simulations are created, updated, and leveraged for decision-making may give us insights into an array of functions which we know to be key to human intelligence, such as causal induction, prediction, planning, and even empathy.

Previously I suggested that, according to one theory of cognition, our brains continuously:

  1. Identify and represent increasingly abstract patterns in sensory input
  2. Build a model of the world that generates downward flowing predictions
  3. Calculate prediction errors to correct the model

In this post, we will make these ideas ‘enactive’ by highlighting the adaptive benefits for a living thing able to model its own physical agency, that is, the way the world responds to its actions. We will start by looking at affordances as building blocks of an enactive environment and see how they may aid the identification of causes and structure in ambiguous input. We’ll then reframe prediction as simulation and test the extent to which this concept can shed light on a range of cognitive tasks.

The Windowless Control Room: The Challenge of the Infant Brain

In Intuition Pumps And Other Tools for Thinking [Ch 23], Daniel Dennett poses a thought experiment that I frequently come back to as fuel for intuition about the brain’s daunting task. It goes like this. One day, instead of in your own bed you wake up in a small windowless room. Two of the walls are covered in thousands of lights blinking seemingly randomly, and the opposite two walls are covered in equally many buttons, none of them labeled. A note informs you that you’re in the head of a large robot, and warns you to be careful, because the robot’s home is full of hazards that could kill it (and you) if you make a mistake. But don’t worry! The lights on the wall are connected to thousands of sensors, cameras, etc on the robot’s body, and each button controls some actuator or behavior (like moving a leg or arm, or turning the robot’s head).

Inside the robot brain.

In this thought experiment, you (as a personified stand-in for an infant’s brain) must somehow learn how to choose actions to make your way through the world. The problem this illustrates is that, as stated, the task is impossible: though you may notice some patterns in the lights (groupings, repeating series, even regularities that look like causal relationships) you have no direct access to the outside world (no window), and as such there’s no way to identify when the robot is in the vicinity of good things, or bad things, when it’s hanging precariously close to a cliff, or when a useful object lies in front of it.

Now imagine how much the problem changes if we say that one of those lights among the thousands glows green (while the others are red), and our note tells us that this light indicates that the robot is experiencing reward — something good is happening. Intuitively, this feels a bit more tractable since we now have a starting point, a single dimension which we can use to provide a context of valence (good thing/bad thing) to patterns we notice in the other lights. That said, this remains an incredibly challenging problem requiring our brains to develop a system to ‘understand’ the world in sufficient detail to aptly move within it, all without ever directly ‘seeing’ itself or its environment. Our best evidence that this questionably feasible problem is solvable is that each of us did survive infancy, and whatever the solution might be, we benefit from our brain’s implementation of it every millisecond.

Active Transformations as Affordances

The affordance is a concept from psychology that, if familiar, probably came to you via the world of user experience design, where it has become ubiquitous (and more than a little jargon-y). Defined by the concept’s originator JJ Gibson: “The affordances of the environment are what it offers the animal, what it provides or furnishes, either for good or ill… It implies the complementarity of the animal and the environment.” [Gibson 1979]. In enactivism, the internal representation of such opportunities for interaction play a critical role. We will expand the definition accordingly: enactive affordances are cognitive modules that have learned to both recognize and facilitate transitions in sensory input, whether we physically cause these transitions or simply observe them in the environment.

We can think of enactive affordances as being grouped into two species: motor affordances learn contextualized transitions realized by motor commands, while hidden affordances learn sensory transitions realized by external causes in the world. There are two types of sensory information that it is here useful to distinguish: exteroception (sensitivity to stimuli originating outside of the body), and interoception (sensitivity to stimuli originating inside of the body such as affective/emotional responses). A button affords (in the motor sense) pushing, ie I can push the button to realize the transformation Button.OffButton.On. Similarly, a dog affords (in the hidden, external sense) barking, ie the dog may (of its own volition) bark, which has an immediate impact on the sensory environment: Noise.QuietNoise.Barking. Though it is sometimes easier to think about these exteroceptive transitions, a given enactive affordance can learn transitions of either type.

Recruiting affordances: learning to perceive and generate perceptual transitions in the world. Left: the motor affordance ‘push’ learns the transition from Button.Off to Button.On. Right: the hidden affordance ‘bark’ learns the transition from Noise.Quiet to Noise.Barking.

Let’s consider how such afforded transitions might be learned and imagine the grid of lights from Dennett’s thought experiment. Since we are told these lights indicate sensed information from the environment, we expect there to be structure in the signal (both spatial: groups of lights that are often on simultaneously, and temporal: relationships in which one group of lights frequently turns on after another). Assume you noticed that a group of lights (which we’ll call S1) is often (but not always) followed by another group (which we’ll call S2). Imagine now that you notice a third group of lights that appears to predict whether S1 will be followed by S2. You might infer that this third group (illustrated as P or B above) is responsible for (a cause of) the transition S1 → S2. If, as in the ‘Push’ example, P is linked to the motor system, you have just learned to detect a useful affordance: in the present context, when perceiving S1, if I take action P, I’ll perceive S2.

As we’ll see shortly, the exact same networks that detect such afforded perceptual transitions may themselves allow the brain to generate the perception of (ie simulate) the transition, regardless of the state of the outside world.

Context: Maps of Associated Affordances

One of the key ideas behind enactivist theories is that these affordances are the building blocks of our perceptive environment: we do not perceive individual features or discrete objects, only the opportunities for action they offer. When we view an object in a particular state our brains automatically present us with several, or perhaps a multitude of possible states that could be achieved via our learned affordances. As we take action we move into new hierarchical contexts and our affordance repertoire is updated as well, offering a new set of possibilities for bringing the capabilities of our motor system in line with the external world. The result might be visualized as a map, or more formally, a graph.

A simple robot living in a trivial ‘ring world’, by learning affordances in a given context, can generate an internal map useful for prediction, and perhaps action selection.

Above we can see a trivial case of such an affordance map, that of a robot in a ring world. It can rotate left and right to move between three visual states (red, green, and blue). Having learned the simple dynamics of this world, when the robot perceives red, this map indicates that two affordances are available: the robot can change its state in the world to perceive green (by turning left), or blue (by turning right). In this illustration, only an overt action (actually turning either right or left) can produce the visual transformation. The concept of an affordance map becomes much more useful when we consider how this relates to prediction, and hence simulation. Our robot, even with a trivial cognitive capability that can be fully represented in the four nodes at left, ‘knows’ something about the world, e.g., it can predict its future from its present.

Simulation & Active Inference

In the previous post I framed the problem of causal induction (identifying the right cause based on the data currently available) as one of pattern recognition, generation of predicted sensory inputs, and finally a calculation of prediction error when the input disagrees with the expectation. Here I want to suggest a subtle adjustment to this perspective which I alluded to before.

Once it has learned a causal graph, a group of enactive affordances, our brain can realize a powerful tactic: each affordance can be inverted to simulate its effects, and by doing so not just predict but simulate the future. In the absence of sensory contradiction the generative simulation is allowed to run forward under its own power. And even more usefully, we are not limited to a single passive projection of what’s likely to happen next, we can leverage our array of cognitive and motor affordances to simulate multiple futures, each unfolding probabilistic step after probabilistic step as a result of a different contingency, and each with a distinct expected sensory result.

The beauty (and surrealism) of the representation suggested here is that our brains both model and experience every aspect of the world with the only raw materials available: there is no light, no sound, just billions of networked neurons. Within this representation, our brains don’t need to wait for the dog to jump, they can imagine it; we don’t need to grab a scalding pot handle, we can activate the Handle.Grab affordance prior to action and physically experience a low level simulation of the (painful) interoceptive results. It’s easy to allow the misnomer of ‘imagination’ to fool us here — there is no theater in the brain to project dramatized images of reality for some intelligent decider to act upon — the experience of the simulation is made of the exact same stuff as the physical (grounded) sensation.

These simulations are a new look at the generative piece of the generative predictive model, and seeing them as such will help illustrate the immense functional benefits of simulation within this theory of empirical enactivism.

“When used in the forward direction (from action to outcome), these associations permit the prediction of the sensory consequences of possible actions. When used in the backward direction (from outcome to action), the implicit mapping enables the selection of an action that produces the (desired) consequences.” — Pezzulo et al 2015

Our simulations are things of change — we do not model and investigate frozen worlds — they unfold temporally in parallel with (but importantly not constrained by) our perception of time. This cognitive strategy can, like a DJ with a turntable, run simulations faster than physics allows; it can skip the predictable nodes in the affordance map and focus on the most ambiguous anticipated moments. And this is where empiricism comes in: it can simulate multiple alternative decisions (or, more generally, competing hypotheses), weigh the outcomes of each, and choose a path accordingly.

Neuroscience has provided evidence for neural representations (observed as correlations of activity as measured by an fMRI machine) of everything from low level visual properties like color, to words, to specific people. Simulation theory posits that not only do these neurons activate in response to a conscious experience of these concepts, but that a key role of each representation is generation. In this way, coalitions of activating contexts produce perceptual simulations that unfold in time in a way that is consistent with the learned probabilistic dynamics (the most likely sequences) in the world. This shouldn’t be too surprising since we can verify our brain’s incredible knack for this easily: imagine hitting a nail into a two-by-four. What happened? Not only did you just confirm that it’s possible to get a cylinder of iron into a block of wood, but you probably visualized that to do so you’d typically use a hammer, and you probably experienced an echo of the sensations you’d feel when performing this action: the handle pushing back against your hand, the sound of metal against metal, and possibly even a hint of fear as you consider a painful miss.

Simulation in Perception, Prediction, and Planning

Ocular simulation as perceptual ‘filling in’ — abstract representations of potential gaze targets produce the experience of a complete and rich visual field

One of the fascinating potential extensions of simulation theory is in the realm of perception. Could it be that the magical ‘filling in’ of our perceptual world (remember that the fovea of your eye produces a shockingly small radius of detail, despite what appears to be a rich visual scene), can be explained by simulation as well? Consider that just as we have motor affordances such as extending a finger to push a button, actions as subtle and typically subconscious as eye movements (saccades) are also generated by our motor system. Contexts that present themselves to us visually may, then, present ocular affordances: the percept of the trunk of a tree affords the ocular motor commands necessary to produce a visual transition to its branches above. If simulations are indeed happening constantly, and exploring (and generating) multiple futures conditioned on a variety of afforded actions, by extension we must at all times be simulating the effects of plausible saccades — imagining the visuals we’d experience if we were to look up, left, wherever.

Let’s look at the case of simulating visual actions in a little more detail. Imagine viewing a printed letter through a microscope, and seeing the cropped image in the left box below. You’d likely have guessed you were looking at an ‘i’, a ‘j’, or noted the ambiguity — it’s one or the other but impossible to tell which.

Visual perception as ocular simulation. Saccade.Down produces conflicting futures (‘bottom of i’ and ‘bottom of j’) that demand resolution via action. Saccade.Up is simulated with consensus from both hypotheses, and does not require validation via action.

Resolving what we might call causal ambiguity is a crucial and rarely conscious part of perception. When viewing the ambiguous image of the cropped letter, we might recruit two possible actions: we can move our eyes slightly up or slightly down to reveal the content above or below the current field of view. Simulations of these two afforded actions partially activate the same networks we’d use in the overt action (the actual saccade), but without executing them. Because we’ve seen the letters i and j many times before we have a memory of the features that compose each. Saccade.Up is guaranteed to produce the percept of a dot atop a letter, but the simulation of Saccade.Down is ambiguous. To use a term coined by the father of active inference, Karl Friston, this action has a high epistemic value: it is very likely to resolve a conflict in our simulation, the ambiguity of ‘I’m looking at i’ vs ‘I’m looking at j’.

Though it is well beyond the scope of this post, some recent formulations of enactive cognition propose that the brain’s decision-making can be modeled by a simultaneous drive for epistemic value (resolving ambiguity in the world) and direct (intrinsic) reward.

Implications

There are numerous fascinating implications of enactivism and the concept of cognitive embodied simulation, especially when we apply these ideas to higher level planning on longer time scales, as well as simulations of the actions of others. For me this theory also frames an important reality: the stuff of simulation, its primary driver, might be summed up as cognitive bias. Give a brain a seed of a few abstract concepts (hammer, nail, two-by-four… go), and it will not only generate, but experience futures that have not yet (and maybe never will) become reality. Imagining the future by running a simplified model in this way is a computationally powerful tool, but getting it was an expensive bargain, the cost of which is a propensity for an array of misconceptions, overconfident projections, and every kind of prejudice. At times, our simulations are paper thin, and this highlights a jarring weakness in what may be a key mechanism behind each decision we make.

Want to read more or see some data?

See here for a partial list of papers sharing empirical results related to enactivism or simulation theory.

--

--

Jeremy Gordon
The Spike

PhD student @BerkeleyISchool. Founder @echo_mobi, ex @StanfordEng @Kiva. Writing/research on embodied cognition, perception, prospection. http://jgordon.io