Carlos E. Perez
Jul 28, 2018 · 8 min read
Photo by yang miao on Unsplash

How can we understand progress in Deep Learning without a map? I created one such map a couple years ago, but this map needs a drastic overhaul. In “Five Capability Levels of Deep Learning Intelligence”, I proposed a hierarchy of capabilities that was meant to inform the progress of Deep Learning development. In that proposal, my classification was based on structural components that I suspected should exist at each level:

Five Capability Levels of Deep Learning (Now Revised!)

So specifically, you begin with a feed forward network in the first level. That would be followed by memory enhanced networks, examples of which would include LSTM and Neural Turing Machine(NTM). Followed by networks that are able to ingest knowledge bases (bridging the semantic gap). The next level would encompass systems capable of handling imperfect or partial information and finally leading to a society of mind as described by Minsky.

When the above classification was proposed, there was a lot of hype about the promise of DeepMind’s NTM. It was once thought that climbing up the Chomsky hierarchy would lead to more advanced Deep Learning capabilities. This approach has since died down as the capabilities of NTM were revealed to be disappointedly unscalable. Chomsky’s hierarchy only reveals what can be computed, it is unable to provide insight into the nature of learning.

“Reasoning and learning are two sides of the same coin.”

There’s a lot of detail that was missing in my original proposal. The most glaring oversight was detail on how to evolve (not necessarily design) learning systems. My original capability model was not inspired by the paradigm that learning is primarily driven by intuition machines:

Although I did capture the existence of the semantic gap, I said little about “contextual adaptation”. Absent in my proposal was the idea of “embodied learning”. I was unaware of the importance of an inside-out (predictive coding) architecture. Finally, I was ignorant of the necessity of a hierarchy of self-awareness. An emphasis on learning must be the driving approach of any capability model of intelligence. Reasoning (or inference) and learning are incidentally two sides of the same coin and therefore must be treated equivalently.

“The usefulness of a well informed capability maturity model is that it allows us to get an accurate sense of progress in AI.”

The usefulness of a well-informed capability model is that it allows us to get an accurate sense of progress in AI. This is critically important today, our policymakers do not have the language to describe or even make sense of AI development. We cannot understand progress in AI (and anticipate its benefits and dangers) when our notion of intelligence is wrapped in vague, ambiguous and imprecise language.

Now it is time to unveil my revised Capability Maturity Model for Deep Learning:

To help better understand the distinction of the different maturity levels, I use the conceptual diagram that universally captures the problem of cognition:


Level Zero — Handcrafted (Non-Intuitive Programmed)

These are present day programmed systems. Good Old Fashioned AI (GOFAI) (see: “Tribes of Artificial Intelligence”) systems that are unable to learn through experience fit within this class. These systems perform sense-making through well established deductive reasoning algorithms.

Level One — Stimulus Response (Intuitive Perception)

DL CMM Level 1 - Representation originates from environment

These are present day feedforward deep learning networks that are able to learn regularities that are found in training data. These are universal function approximators. This is what DARPA would describe as “statistical” systems. Conventional machine learning methods such as kernel methods, decision trees, and probabilistic graph models are covered in this class. These systems make sense of the world through inductive reasoning. The most advanced form of these systems is generative systems such as Generative Adversarial Networks (GANs).

An intermediate form of these is state-based models such as RNN and NTM. There is a refinement of state-based deep learning networks that are Turing complete. In my older classification, I created two levels, one for classification only networks and another for memory based networks. One can always decompose this level according to the conventional Chomsky hierarchy with memory-less function at the bottom and Turing complete machinery at the top. It is important to realize that a cognitive machine must be at the same level of the Chomsky hierarchy as its environment. Furthermore, if the environment is Turing Complete, then inductive reasoning has its limitations in that only anti-causal reasoning is possible (i.e. predict cause from observed effect).

Level Two — Dual Process (Intuitive Extrapolation)

CMM Level 2 — Representation from environment, actions influenced by internal world model.

These include systems that merge handcrafted traditional algorithms and intuition based systems (see: “Coordinating Intuition and Rational Intelligence”). Today’s most advanced systems are in this class. An example of this is DeepMind AlphaGo and AlphaZero that combines traditional MCTS algorithm with a conventional deep learning network. These model-free systems are capable of a kind of abductive reasoning to build its internal world models. Tree Search is effectively a systematic way to perform experiments on an internal world model. These world models are non-reflective and opaque, it is at the next level where a causal world model is generated. The rational part of this system is programmed in level 1.

Level Three — Interventional (Intuitive Causal Reasoning)

DL CMM Level 3 — Predictions originate from World Model, Representation driven by interaction.

This is what DARPA describes as “Contextual Adaptation”. This is the second rung in the causality ladder that Judea Pearl describes. Embodied and interactive learning are essential for this level. These systems employ abductive reasoning to build internal models of reality. Interventional systems learn by interaction with the world. These are systems that employ Pearl’s “do-calculus” to refine its world models. The distinction with the previous maturity level is that in this level the models are explicit (but not necessarily transparent).

The process of interaction with an internal world model (i.e. a mental model) is sample efficient. Once a more abstract model is created that represents the causality of the real world, then such a system is able to imagine the cause and effect of its actions prior to actual execution. This abstract world model can also be introspective (i.e. reflective world model). This “What-if” capability motivates the need for an inside-out architecture. That is, it is important to notice the inversion of the cognitive process ( the black dot in the diagram signifies the starting point). Achieving level 3 bridges the semantic gap between sub-symbolic and symbolic systems. Not only does this lead to an explosion of applications, but it leads to truly autonomous cognition.

Level Four — Counterfactual (Intuitive Ingenuity)

DL CMM Level 4 — World model includes representation of self and goals.

This is the third and final rung in Pearl’s causality ladder. Humans are capable of imagining a world and performing thought experiments (i.e. Gedankenexperiment) to create higher and more precise mental world models. What if a cause does is removed from a world model, how would it behave? I would describe this as intuitive ingenuity, that is the ability to explore the world and invent new tools and models that more efficiently transform or predict the environment. Counterfactual systems answer “Why?” Judea Pearl argues that it is this level of capability that differentiates homo sapiens with the rest of the animal kingdom. At this level, the Narrative Self emerges. Here’s a nice TEDx talk describing imagination and knowledge.

Level Five — Conversational (Intuitive Empathy)

DL CMM Level 5. Context includes reasoning about other selves.

This final level is what is needed to achieve what Brendan Lake describes as intuitive psychology. This is what Michael Graziano describes in his Attention Schema Theory. This I describe as Conversational Cognition. Perhaps Minsky’s Society of Mind or compositional game theoretic models are required to achieve this maturity level. At this level, we go beyond learning world models through an individual’s imagination. We get to a level where world models are learned through the interaction of many conversations.

This new capability model is more functional than my previous proposal that was more structural. The problem with a structural definition is that it is unclear as to what kinds of capabilities are enabled with each new structure. Furthermore, it is also unknown as to precisely what kind of cognitive structures are required to arrive at a higher level of cognition.

It’s interesting to note that many companies that brand themselves as having “Artificial Intelligence” are at only Level 0 in this capability model. Firms that do data science employ only Level 1 tooling. Firms like Google have deployed Level 2 capabilities such as foreign language translators and furthermore have demonstrated sophisticated game playing research projects (see: AlphaGo).

Our own human cognitive capability is our most reliable guide to achieving artificial cognitive systems. In Computer Science, Chomsky’s hierarchy is a guide for more complex computational machinery. Unfortunately, Chomsky’s hierarchy doesn’t have any resolution beyond the Hybrid level (level 2). Turing Completeness is a necessary requirement for advanced cognition. Despite the universality of Turing machines, it remains unknown as to what will be needed to achieve levels 3 up to 5. Despite this unknown, this maturity model is useful in that it expresses the capability that is needed at higher levels.

This maturity model, therefore, should be a good enough guide for you to understand how far or how near civilization is in achieving an artificially “human complete” system. It is important however to be aware of the pace of progress. The first kind of Intuitive-Rational Hybrid systems (Level #2) was demonstrated effectively in 2015 (See: “Sputnik moment”). Deep Learning systems were discovered in 2012, and self-play GANs were discovered in 2014. We are now seeing in 2018 signs of Interventional systems (i.e. level #3) in the form of what is known as “Relational Deep Learning” or very easy to overlook name: “Graph Networks”. This is why it’s a very exciting time for Deep Learning research. The breakthroughs are going to be extremely fast and furious!

What’s interesting though is how the nature of information changes as you go up the capability maturity ladder.

A panel discussion with Yudea Pearl and other DL experts discussing the causality ladder (level 1, 3 and 4)
Explore Deep Learning: Artificial Intuition: The Improbable Deep Learning Revolution


Exploit Deep Learning: The Deep Learning AI Playbook

Intuition Machine

Deep Learning Patterns, Methodology and Strategy

Carlos E. Perez

Written by

Author of Artificial Intuition and the Deep Learning Playbook —

Intuition Machine

Deep Learning Patterns, Methodology and Strategy

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade