Revisiting the AGI Capability Roadmap

Carlos E. Perez
Intuition Machine
Published in
9 min readJul 10, 2021
Roadmap by VQGAN & CLIP

Three years ago I proposed a deep learning capability maturity model (see: A New Capability Maturity Model for Deep Learning ). Today I will revisit this maturity model (i.e. a roadmap in more pedestrian terms) to see it in light of my more developed understanding of general intelligence.

Let’s begin with the premise that “reasoning and learning are two sides of the same coin and should be treated equivalently.” This notion can be illustrated in this diagram:

All models are wrong, but some models are useful. The usefulness of a well-informed capability model is that it allows us to get an accurate sense of progress in AI. This still remains critically important today. Our policymakers should have the language to describe or even make sense of AI development. We cannot understand progress in AI (and anticipate its benefits and dangers) when our notion of intelligence is wrapped in vague, ambiguous, and imprecise language.

My 2018 proposal is based on six evolutionary stepping levels, each level is dependent on the foundation established by the step below.

Let me remark however that just like the development of any living organism, its design is not like the clean designs that humans construct. Newer capabilities feedback into previously established capabilities. So the diagram above seemly depicts what I see as to how AGI progress will be unveiled over time.

A major stumbling block of analytic endeavors like science and engineering is the immense appeal of reductionist methods. However, if there is anything we have learned about complex systems is that we have to understand these systems in a holistic manner. The epistemic cuts we make to divide systems can lead to an impoverished understanding of complex systems. Reductionism has lead to the dominance of ‘brain in a vat’ research in the nature of cognition.

Therefore to gain a complete understanding of cognition we need to frame cognition that is not just about the self but rather the self is embedded within its context:

See: https://medium.com/intuitionmachine/moravecs-paradox-implies-that-agi-is-closer-than-we-think-9011048bc4a1

Level Zero — Handcrafted (Non-Intuitive Programmed)

When John Von Neumann wrote his proposal about how to design a universally programmable computer, he employed the McCulloch-Pitts model of a neuron as motivation for his design. The invention of computers was the catalyst for exploration in Artificial Intelligence. It was a widely held belief for decades that if we could just program computers to perform logic and symbol manipulation that we could achieve human-level intelligence in artificial form. This led to decades of effort in what is known today as Good Old Fashioned AI (GOFAI) (see: “Tribes of Artificial Intelligence”).

The fallacy of the approach originates from a very shallow understanding of human-level cognition. Furthermore, the GOFAI approach never was able to implement useful learning algorithms. In hindsight, it is obvious that algorithms that cannot learn are not algorithms that lead to human-level intelligence.

At this level of development, we begin at the base camp where universally programmable symbolic machines have been invented. What is clear is that humans lack the intelligence to understand how to program computers to behave with even the sophistication of the common honey bee. Billions of years of knowledge accumulated via biological evolution is beyond the grasp of present-day human capabilities.

Level One — Stimulus Response (Intuitive Learning)

DL CMM Level 1 — Representation originates from the environment

The present decade has led to the discovery of machines capable of learning. These machines are implemented with feedforward deep learning networks that are able to capture the regularities that are found in immense training data. The combination of more powerful computers and larger data sources had lead to algorithms with surprisingly capable predictive capabilities. Deep Learning networks are universal function approximators that can fit any complex function within a finite domain.

A surprising development for deep learning networks is that not only are they very good universal approximators, but also that they are extremely useful in natural language applications. This is a counter-intuitive finding. This is because language is made of discrete tokens. It is very surprising that algorithms that are formulated using continuous mathematics would have such profound relevance in the domain of natural language.

The other development of these universal approximators that is equally counter-intuitive is their ability to generate data. When a neural network is coupled with another adversarial neural network, the competitive dynamics of learning can lead to agents that generate data. The most advanced form of these generative agents is known as Generative Adversarial Networks (GANs). GANs are now pervasive in smartphone applications that allow users to render themselves at different ages, like cartoons or even singing popular tunes.

Level Two — Dual Process (Intuitive Extrapolation)

CMM Level 2 — Representation from the environment, actions influenced by internal world model.

The limitation of level one deep learning networks is that the internal models that they learn are not explicit. These algorithms behave as a consequence of pure habit. But what happens when we design these algorithms to create representations of the task they are trained to perform. Representations serve as useful stand-ins for objects to achieve tasks. An everyday example of this is Arabic numerals. We are able to learn how to multiply numbers because Arabic numerals have spatial properties that allow us to visualize its calculation. This differs from a Roman numeral representation where multiplication is very cumbersome to perform. The key to good representations is that rules for its manipulation are simple and thus easy to learn.

How do we create algorithms that conjure up useful models of their world? The first step toward that goal is to merge handcrafted traditional algorithms (see: level zero) and intuition-based (see: level two) agents (see: “Coordinating Intuition and Rational Intelligence”). A lot of recent research today involves attempts to merge System 1 and System 2 like cognition.

An impressive example of this hybrid approach is DeepMind AlphaGo and AlphaZero that combines traditional MCTS algorithms with a conventional deep learning network. These model-free agents are capable of a kind of abductive reasoning to build their internal world models. Tree Search is effectively a systematic way to perform experiments on an internal world model. The world model here is captured explicitly by the programmer.

Level Three — Interventional (Intuitive Causal Reasoning)

DL CMM Level 3 — Predictions originate from World Model, Representation driven by interaction.

The next level in the maturity model or roadmap is when the world model is learned and not explicitly defined by the programmer. This is what DARPA describes as “Contextual Adaptation”. This is also the second rung in the causality ladder that Judea Pearl describes. The agents at this level must reconstruct a causal model of the world.

Embodied and interactive learning are essential at this level. These agents employ abductive reasoning to build internal models of reality. Interventional agents learn by interaction with the world. These are agents employ Pearl’s “do-calculus” to refine its world models. The distinction with the previous maturity level is that at this level the models are explicit (but not necessarily transparent).

The process of interaction with an internal world model (or a mental model) should be sample efficient. Once an abstract model is created that represents the causality of the real world, then such an agent is able to predict the cause and effect of its actions prior to actual execution. This abstract world model is introspective (i.e. reflective world model).

This “What-if” capability motivates the need for an inside-out architecture. That is, it is important to notice the inversion of the cognitive process ( the black dot in the diagram signifies the starting point). What this means is that an agent learns through subjective interaction with the world. A level one agent learns through external supervision. A level two agent learns through competitive interaction. An agent at this level learns by generating internal world models and comparing these world models against their interactions with the environment.

Level Four — Counterfactual (Intuitive Ingenuity)

DL CMM Level 4 — World model includes representation of self and goals.

This is the third and final rung in Pearl’s causality ladder. Humans are capable of imagining a world and performing thought experiments (i.e. Gedankenexperiment) to create higher and more abstract hypotheses. This differentiates itself from the previous level in that this level has developed a vocabulary and thus a language to express hypothetical world models. So the world model is of sufficient expressiveness to reflectively capture the self, its goals, and the context it resides in.

Counterfactual agents answer “Why?” Judea Pearl argues that it is this level of capability that differentiates homo sapiens from the rest of the animal kingdom. At this level, the narrative self emerges in that agent is able to generate expressions of itself in relation to the world.

Level Five — Conversational (Intuitive Empathy)

DL CMM Level 5. Context includes reasoning about other selves.

Michael Tomasello has argued that shared intentionality is what differentiates humans from the great apes. I’ve always argued that conversational cognition required by social animals is exhibited by the most complex of agents. At this level, more expressive world models are learned through the interaction of agents through shared behavior and complex conversations. Conversations imply language and thus cultural learning. Capabilities at this level inform the Social Self. The emergent effect of conversational agents is that it drives towards explanatory models. Agents here are empathic and are driven by what I would compactly describe as the Empathy Prior and this is described in my book ‘Empathy Emergence’.

Level Six — Linguistic Culture

There’s a new level that I realized is very important to intelligence. That is of the importance of symbols in cognition.

Which has the odd feel that the gap between level zero and level six is precisely what others have described as the semantic gap:

But the most counter-intuitive notion is that symbols are fundamental to general intelligence. Practitioners of GOFAI have misunderstood the notion of symbols and focused incorrectly on symbol manipulation systems while ignoring the importance of symbol grounding.

Summary

But it must remain clear that implementing AGI is immensely complex. We must be aware that the map is not the territory. Furthermore, even when you attempt to identify the specific capabilities required for AGI, that map explodes in size, with many details left vague and ill-specified:

After three years since I have proposed this maturity model or roadmap, it has surprisingly survived the test of time. Three years ago, some organizations actually believed that AGI was achievable in five years:

Despite tremendous progress, they were overly optimistic. It’s still going to take some time as indicated by the map above. It’s important though that we do have a map and my proposal is a good one to guide your understanding of the continuing progress towards AGI. Let me know if you’ve stumbled upon a better one!

--

--