One good way to frame the question of the limits of Deep Learning is in the context of the Principle of Computational Equivalence by Stephen Wolfram. Wolfram showed that simple cellular automation are able to exhibit complex behaviour that cannot be predicted from initial conditions or the simple rules that specify its incremental behaviour. Certain kinds of cellular automata can exhibit complex behaviour that cannot be reduced to a mathematical model that capture its behaviour in closed form. Wolfram examples of an ‘irreducible’ system that exhibits this complex behaviour are the brain and weather systems. Wolfram classifies these kinds of systems as exhibiting “Universality”.
A Deep Learning system that has memory belongs to this class of universal machines, however this does not imply that these systems can replicate the behaviour of another universal machine. Bernhard Scholkopf reveals this conclusion in a paper “Towards a Learning Theory of Cause and Effect”. That is, a learning system is able only to derive the cause while observing the effect. This tells you that a Learning system can’t learn the mechanisms of say, how a DNA manufactures specific proteins. A fundamental limitation of any system that learns from data is that it cannot predict effect from cause. This theory is analogous to the “Halting Problem” in Computability Theory.
One criticism that I hear often about Deep Learning is that it doesn’t capture the biological mechanisms of the brain. This is a fair criticism. However, Wolfram’s Universality explains why it should not be a major issue toward achieving AGI. Deep Learning systems can possibly have equivalent capabilities as a biological brain albeit by using different computational mechanisms. At the fundamental level, all these systems are computational systems that exhibit three building blocks. That is, computation, memory and signalling. Complex behaviour is an emergent behaviour that like cellular automata arises from very simple rules.
A useful schema in understanding the capabilities of a system to learn or discovery unknowns can be stated as follows:
Knowable knowns, meaning models that will converge on training data. Knowable unknowns, are trainable models that are able to make accurate predictions on non-training data. Unknowable knowns, reflects an inability to learn a known system, this covers the area of preforming predictions of other irreducible universal machines. Unknowable unknowns is inability to discover what a machine does not know. We break this down in more detail:
(1) Knowable Knowns — Given a large enough set of knowns (i.e. training data) we can get good convergence of our prediction errors.
(2) Knowable Unknowns — This is an expression of the concept of generalization. With good generalization, we can know about test data that a machine has never encountered in its training set.
(3) Unknowable Knowns — However, there are certain classes of system that deep learning can never learn. These are in the class of computational irreducible systems. However a Deep Learning system may detect the direction of causality and be able to know that is able to learn from the system. What it will not be able to know (unknowable) is if a system is in the class of computational irreducible systems.
(4) Unknowable Unknowns — Finally, there is a class of total ignorance. This is really a meta-physical statement. The class of unknowables that cannot be known is unknowable. Think of it as the “Great Firewall” of knowledge.
The reader may have encountered a similar classification before by an infamous defense secretary. That is, “Known knowns”, “Known unknowns”, “Unknown knowns” and “Unknown unknowns”.”Known knowns” are what we current know. “Known unknowns” are what we know that we don’t know. “Unknown known” is a reflection of willful ignorance or what politicians may call “alternative facts”. “Unknown unknowns” is simply ignorance of what one does not know. A good cliche is the “Black Swan.” That is the belief that black swan’s don’t exist when they in fact do. This schema differs in that the “knowable” schema in that this expresses the current state of understanding rather than an ability to learn.
For the study of learning machines it is more important to understand what is knowable rather than the current state of knowledge.
Even though the above schema of ignorance looks complicated, it is but the tip of the iceberg of understanding ignorance. Consider more complexities such as misinformation, detecting model bias, ambiguity, disagreement, or accommodating for change.
♡ Heart if an unknown has become known to you!