Modular Deep Learning could be the Penultimate Step to Consciousness

We are all aware that consciousness exists, yet we don’t have an adequate explanation for its emergence. I’ve written down previously about what I see as five capability levels of Deep Learning:

  1. Classification Only ( C )

2. Classification with Memory (CM)

3. Classification with Knowledge (CK)

4. Classification with Imperfect Knowledge (CIK)

5. Collaborative Classification with Imperfect Knowledge (CCIK)

I purposely left out a the penultimate level, that of achieving self-awareness and consciousness. I did so because it is such a wide chasm that will eventually need to be crossed, yet we know so little about. I would like to think that my investigations are evidenced based rather than ones based on thought experiments.

There are however some theories out there that I think it is worth studying. Not because there is any solid evidence, but rather because the thinking behind them seems to make sense. We have no verification if any of these theories are true, but we can still use the attributes that are predicted. Possibly employing them as that as a guide in our Deep Learning research.

Before we begin exploring consciousness, I would to point out the kind of architectures that we are using as a starting point. It is my conjecture that to achieve artificial consciousness, you have to have as your base an architecture of independent DL agents that are interacting in a decoupled manner. We discussed this briefly in my post “The End of Monolithic Deep Learning” as well as an approach for coordinating agents in “Market Driven Coordination” call this Modular Deep Learning.

I would like to discuss two theories that I find promising. The first is Integrated Information Theory (IIT) by Giulio Tononi and the second is the theories from Jurgen Schmidhuber.

“From the Phenomenology to the Mechanisms of Consciousness: Integrated Information Theory 3.0" or IIT 3.0 describes the theory (Here is a TED talk about this):

IIT begins with several axioms, translates these axioms into postulates that are conditions that must be satisfied to achieve consciousness. The postulates are as follows:

EXISTENCE: Mechanisms in a state exist. A system is a set of mechanisms.
COMPOSITION: Elementary mechanisms can be combined into higher order ones.

and corresponding mechanisms:

INFORMATION: A mechanism can contribute to consciousness only if it specifies ‘‘differences that make a difference’’ within a system. That is, a mechanism in a state generates information only if it constrains the states of a system that can be its possible causes and effects — its cause-effect repertoire.
INTEGRATION: A mechanism can contribute to consciousness only if it specifies a cause-effect repertoire (information) that is irreducible to independent components.
EXCLUSION: A mechanism can contribute to consciousness at most one cause-effect repertoire.

The theory expands also to a set of agent mechanisms “Systems of mechanisms” :

INFORMATION: A set of elements can be conscious only if its mechanisms specify a set of ‘‘differences that make a difference’’ to the set — i.e. a conceptual structure.
INTEGRATION: A set of elements can be conscious only if its mechanisms specify a conceptual structure that is irreducible to non-interdependent components (strong integration).
EXCLUSION: Of all overlapping sets of elements, only one set can be conscious — the one whose mechanisms specify a conceptual structure that is maximally irreducible (MICS) to independent components.

The following graphic captures these axioms and postulates in even greater detail:


IIT proposes that consciousness is a matter of degree and proposes a measure of consciousness. In other words, many systems are already conscious, but with varying degrees of consciousness. The theory is quite elaborate, the key take away though is the emphasis on information structure that captures causality and that the richness of that causality structure indicates a measure of consciousness. Note that causality, cause and effect, Bayes rule are all related to mutual information.

Schmidhuber is at interesting character because he is pretty sure that the nature of consciousness has been solved. His theory combines elements that are more familiar to DL practioners, however he discussion consciousness with the context of what he labels as Gödel machines.” His claim is that AI gained consciousness way back in 1991:

I would like to claim we had little, rudimentary, conscious learning systems for at least 25 years. Back then, already, I proposed rather general learning systems consisting of two modules.
One of them, a recurrent network controller, learns to translate incoming data — such as video and pain signals from the pain sensors, and hunger information from the hunger sensors — into actions.
Since 1990, our agents have tried to do the same thing, using an additional recurrent network — an unsupervised module, which essentially tries to predict what is going to happen. It looks at all the actions ever executed, and all the observations coming in, and uses that experience to learn to predict the next thing given the history so far. Because it’s a recurrent network, it can learn to predict the future — to a certain extent — in the form of regularities, with something called predictive coding.
As the data’s coming in through the interaction with the environment, this unsupervised model network learns to discover new regularities, or symmetries, or repetitions, over time. It can learn to encode the data with fewer computational resources — fewer storage cells, or less time to compute the whole thing. What used to be conscious during learning becomes automated and subconscious over time.
One important thing about consciousness is that the agent, as it is interacting with the world, will notice that there is one thing that is always present as it is interacting with the world — which is the agent itself.
I’m pretty convinced that all the basic ingredients to understand consciousness are there, and have been there for a quarter-century. It’s just that people in neuroscience who maybe don’t know so much about what is going on in artificial neural network research, they are not yet so aware of these simple basic principles.

Jurgen Schmidhuber’s conjecture is that two recurrent networks, one responsible for actions and a second one responsible for predicting the world are the basic ingredients to achieving consciousness. That is CCIK in my classification scheme leads to consciousness. It is indeed extremely intriguing, in fact the second controller is actually performing a kind of meta-learning (see: “Meta Meta-Model”). Here is Schmidhuber talking about this:

What is striking about both models of consciousness is that they have similar claims. That is, consciousness already exists in simple mechanisms and that human level consciousness is just a matter of degree. Both theories claim that there is no need for a new kind of mechanism to achieve consciousness. The conceptual missing link to explain consciousness is already known.

An additional similarity is that there is mechanism that handles temporal causality. IIT revolves around being able to create internal models that capture the causality between concepts. Schmidhuber’s approach employ RNNs that are able to recognize patterns in time.

I am uncertain if the mechanism proposed in IIT are in advanced modular DL systems, we are in the early stages of discovering this. On the other hand, Schmidhuber is saying we don’t need to look far. Schmidhuber in the most recent NIPS conference argued that GANs (“The coolest thing in the last 20 years”) were identical to his paper in 1992, “Learning Factorial Codes by Predicability Minimization”. This paper describes a system that involves “two opposing forces”. I can’t tell if this paper is about the same thing that he claims has achieved consciousness. I don’t know if Schmidhuber is claiming that GANs are conscious!

What bothers me about all this is that we can be very near to Artificial General Intelligence (AGI) if these two theories are correct. Alternatively, we can be very far away, not knowing what that missing link may be. In both alternatives, one can’t predict when we arrive at AGI and that is very disconcerting.


“While neuroscience might shed light on the input and output functions of the brain, the quantifi- cation for integrated information we have presented here implies that it will be unable to shed light on the complex tangle that is core consciousness.”

Strategy for Disruptive Artificial Intelligence: