An illustrated overview of how our brains (might) think — the fascinating intuition of the generative predictive model

“Perception is controlled hallucination.”
— Andy Clark

The last fifteen years have seen a growing agreement from a surprisingly wide spectrum of the academic landscape on the topic of how our minds work. Theories of perception from psychology, neuroscience, philosophy and machine learning are beginning to resolve into something that feels more unified than ever before, and surprisingly intuitive. The generative predictive model (as it is called by Andy Clark), posits that our brain’s primary function is to reduce surprise by developing an increasingly nuanced model of the world. It is constantly predicting the future (on multiple time scales), noticing incorrect predictions, and updating its model to explain away the difference.

Perception mediates every aspect of human experience, and so any change to the way we understand perception comes with a huge breadth of implications for both the way we see ourselves as individuals as well as how we understand human institutions. My purpose here is to share, through visualization, the intuition of this model in the hope that others will find it similarly fascinating.

As a disclaimer, much of the below is still speculative theory. Though we have a growing body of evidence from human behavior, and computational models have proven effective, evidence of a generative model at the neural level remains illusive. Only future research will inform us of how well this model lines up with measurable human brain function.

Building Blocks — Neurons and Spikes

The brain’s generative model is built upon its basic computing units — the neurons — and the hierarchy of connections they make. We’ve been aware of the significant role of neurons in how brains (and the entire nervous system) work since the 1800’s, thanks to the work of Santiago Ramon y Cajal, sometimes known as the father of modern neuroscience.

Those looking to understand the neuron in order to analyze and predict its function have created an array of simplified models that abstract away some or all of the lower level biochemical dynamics. Artificial neural networks, and their modern evolution, deep learning (and deep reinforcement learning), are a category of machine learning techniques that has brought to the world increasingly impressive computational feats (e.g. beating humans at image recognition, Chess, Go, old-school video games, and Texas Hold-em, among others). These networks are built from a vastly simplified neuron that does nothing more than take a bunch of inputs, perform some addition and multiplication, and pass them on. In these cases, of course, it is the structure of the network (the way the neurons are connected) and the algorithm by which these networks learn, that enable advanced capabilities.

Three views of a neuron. Left: real neuron, calcium imaging, center: illustration, right: schematic.

In reality, an incredibly complex system of biochemical pathways are at work in any given neuron, but it can be useful to zoom out a bit, and view neurons as information processing units.

Information is passed in from other neurons as voltage spikes through the vast branching network of roots (dendrites), and moves towards the cell body. If the right concurrence of these signals arrives at the right time, the cell will be activated, and pass on this voltage spike down its axon (the main outgoing channel which can be up to a meter in length for motor neurons), to the dendrites of other downstream neurons.

Propagation of action potential (voltage spike) down an axon

The neuron’s anatomy allows it to precisely detect patterns, possibly many thousands distinctly, and pass on their detection to any other neurons it is connected with.


There is good evidence that the vertebrate brain, and particularly the neocortex, which developed most recently in humans, is organized hierarchically, with the lower levels handling raw sensory input — data from our eyes, ears, nose, skin, muscles, etc — and each higher region taking inputs from the outputs of the regions below. There is also evidence that each region, regardless of which sense it receives information from, is performing exactly the same kinds of processing: A) finding and encoding relevant structure in its inputs, B) building a model to explain the structure seen, and C) using this model to predict future events.

A) Finding Structure (pattern recognition)

In the nomenclature of Hawkins and colleagues at Numenta, finding structure is a process of spatial pooling. Individual neurons in a region learn spatial patterns of incoming inputs. For example, a neuron can learn to detect the coincidence of a few thousand incoming messages from the region below. It does so by learning associations over time. There are multiple mechanisms by which this learning occurs, and the actual algorithms our biology uses are a topic of debate and present research, but a basic form called Hebbian Learning occurs when ‘neurons that fire together, wire together’.

Finding structure in a hierarchically lower region. In this simple example, the highlighted cell takes input from three neurons in the region below, and therefore activates when a pattern involving those 3 cells is detected. Following diagrams represent cells and regions in 2D, as shown on the right.

In the above diagram, if neurons a, b and c consistently fire together over time, their connections to d (in region 2) will strengthen, and as a result d will learn to represent or summarize this feature of the activity in R1. Because the neurons in R2 learn to detect similar coincidences (or more complex patterns) seen frequently in R1, R2 forms a representation or summary of its inputs using fewer neurons. In math, this is called dimensionality reduction, and it means that the region has efficiently encoded the information coming in.

Feed forward pattern detection. The cell in the top region is set up to be activated by brightness in the top of the image, and inhibited by brightness at the bottom. Result: ultra simplified detection of images that are bright at top and dark on bottom. (Key — green: active excitatory neuron, red: active inhibitory neuron, gray/white: inactive, rings: spatial source or ‘receptive field’ of each cell, each ring passes the brightness from that section of the image to the cell above it)

The illustration above demonstrates the way these feed-forward connections can allow a higher region to ‘summarize’ the raw activity below in a more abstract representation. As a direct result of this abstraction, these representations change less frequently over time, and as such are called invariant representations. In the words of Hawkins:

“Each region of the cortex learns sequences, develops what I call ‘names’ for the sequences it knows, and passes these names to the next region higher in the cortical hierarchy … This ‘name’ is a group of cells whose collective firing represents the set of objects in the sequence … By collapsing predictable sequences into ‘named objects’ at each region in our hierarchy, we achieve more and more stability the higher we go. This creates invariant representations.”
— Hawkins, On Intelligence

To illustrate this, imagine we are scanning a scene, and our eyes saccade (rapidly move from one point to another) between small pieces of the scene. Imagine our scene contains a dog and a tree, each with a number of its own features.

Feed-forward invariant representation.

In the illustration at left, we have cells that have learned to represent ‘dog’ sensory inputs (highlighted orange), and those that have learned the features of ‘tree’ (highlighted green). The top region in the hierarchy stays active while any of its learned inputs are active. Here the input object (the thing in our view) is alternating between tree and dog, and during each, encodings (bottom layer) of its features are randomly iterated (e.g. as the observer scans the object).

Just as neuroscientists identified a ‘Bill Clinton’ neuron in the minds of study participants (which is fascinating in and of itself), the top right neuron in this illustration activates when our network is exposed to a dog.

B) Modeling & Prediction

Detecting patterns and structure in the region below is one important step in model-building, but to build a model based solely on present sensory input would be to ignore all the cues of present context, and the history that led to the present moment. That’s why actual brain regions are deeply recurrent — they take inputs not only from below, but also receive extensive feedback from regions above, as well as lateral inputs from other cells in the same region.

Our brains are well adapted for perpetually noisy and incomplete information, and as a result, we aggressively extrapolate and interpolate hints from our senses into complete pictures (models) of our environment. Going back to the previous example, if a higher region has built a model of its environment that includes ‘dog’, it likely does two things:

  1. It biases other cells in the same region (and thus at a similar level of abstraction) via historical associations that have learned probabilistic links to ‘dog’. Perhaps in 30% of recent situations where we perceived dog, we were also in a park. These biases might prime certain networks of cells in the region (learned patterns such as ‘frisbee’, ‘grass’, ‘picnic blanket’) to be more likely to activate.
  2. This group of associated patterns in a region passes down contextual feedback, in the form of predictions or expectations, to lower regions. If this brain region could talk, it may say: “I’m perceiving a dog, so lower auditory region, you may experience a bark, and lower visual region, you may see a tail or a collar”.

From Clark, in Surfing Uncertainty: “…naturally intelligent systems do not passively await sensory stimulation. Instead, they are constantly active, trying to predict (and actively elicit…) the streams of sensory stimulation before they arrive. Before an ‘input’ arrives on the scene, these pro-active cognitive systems are already busy predicting its most probable shape.”

Information as Error

Clark offers a number of compelling arguments explaining that the information traveling up through the hierarchy may be more efficiently encoded as surprise — deviations from expectation — rather than pure positive information. Various forms of this prediction error mechanism have gained traction, and notably, the 2017 Brain Prize went to three researchers focused on, among other things, demonstrating the role of dopamine in communicating prediction error.

Prediction error (deviation from the expected position of the circle) is shown in red. Our simplistic model learns to predict linear motion, but is ‘surprised’ by each bounce, resulting in a spike in prediction error, which triggers an attempt to revise our model.

Conveniently, this paradigm matches intuition on how one might design an instrument like the brain to efficiently encode a changing world. Since the state of the world tends to change smoothly (and rarely abruptly), much of our perception at any given moment in time is similar to what we experienced (saw, heard, felt) at the previous moment. This approach is heavily leveraged in video streaming software. Imagine the data required for your favorite streaming service to show a few frames of video of a hawk flying through a blue sky. Rather than transmit the color of each pixel of your screen (mostly blue, still mostly blue, still mostly blue), many times per second, it transmits only the changes from the previous frame (perhaps a handful of pixels become brown and black in front of the hawk, and blue behind).

To see the complete theory at work in an example, imagine that at any given moment in time, each region of our brain exists in a continuous loop:

As we’ve seen, unlike video services, which know very little about the meaning of what they’re displaying, each region of our brain leverages its model of the dynamics of the world to predict the incoming stream of information. As we watch a hawk cross the sky, our model tells us that this is an object in front of a distant background moving to the right. Each level of hierarchy already has its predictions for exactly what it’s going to ‘see’ next, and thus, in order to stay in sync with the world, all we need to do is adjust our models based on the error between our top-down and lateral predictions and the upward flow of sensory input: that is, our surprise.


When we zoom out from all of this, we can see that our brains are in essence prediction machines that strive to minimize surprise by recognizing patterns and associating them with other patterns. Since the information we receive is noisy and often extremely incomplete, we’ve adapted to aggressively fill in gaps or generalize out from a small set of perceptions. We’re not always successful, but overall we’re shockingly good at these tasks. We are constantly predicting what’s about to happen, and because our predictions are never exactly right we continuously update an ever-changing model of our ever-changing environment.

Most if not all of our cognitive biases flow naturally from this understanding of perception. Many such biases are not weaknesses or failures of the system evolution has developed for us to understand the world, but instances of maladaptation to the modern context.

Another surprising implication is an intuitive reframing of a recently popular argument — that we never experience raw reality. What we think of as perception is a ‘controlled hallucination’ to borrow Clark’s words, which our brains have designed to be precisely as simple and useful to us as possible.

Events that are predicted eventually fade from our conscious experience when they’re no longer useful — we call this habituation. Again, our experience of the world is itself the downward flowing predictions and feedback generated in our model. Think about the well-known experience of preparing to touch a static-y doorknob (you ‘feel’ the feared spark well before your hand contacts metal). Our brain’s model integrates and predicts across all of our senses, which is why we can close our eyes and continue walking a surprisingly long distance without tripping (our model of the road is good enough for quite a while).

Regardless of whether the AI prophets are right, we are most certainly living in a simulation — one that evolved over a few hundred million years, and which each of us has been updating continuously since the day we were born.

If you liked this, please click the ♥ so other people can read about it on Medium

Things to read on this and similar models of perception

  • Surfing Uncertainty, Andy Clark (2015).
  • On Intelligence, Jeff Hawkins (2005). Introduction to the early thinking on a model that is now called Hierarchical Temporal Memory.
  • Consciousness Explained, Daniel Dennett (1992). Dennett introduces his multiple drafts model which is full of parallels with the theories above.

Other resources