What Constructor Theory reveals about Deep Learning
(W)hat should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us.” — Richard Sutton
What is an explanation? I’ve been beating around the bush around this for quite some time. Peter Sweeny wrote an article about AI inevitability where he wrote that “explanations are the unity of interpretations, formalisms and predictions” and presented this diagram from David Deutsch:
I’ve heard of David Deutsch (who pioneered work in universal quantum computation), but never really read his work. What is enlightening is how Deutsch decomposes theory and thus the idea of an explanation into four distinct moving parts and thus explains what explanations are about. In further conversation with Peter (note: This is why it’s valuable to leave questions in the comments), I came to realize that Deutsch was working on a unification theory for evolution, Turing universality, knowledge, and quantum physics. This piqued my interest enough to dig deeper into Deutsch’s theories. In 2012, Deutsch and his colleague Chiara Marletto proposed Constructor Theory. (Note: Peter has more detailed coverage of Deutsch in his article about Explaining AI).
The critical characteristic of Constructor Theory it that it is a different “mode of explanation” from conventional physics. Once you get a grasp of what an explanation is, you now are armed with a new meta-concept that leads you to the insight that there could be many ways to explain things. This is where Deutsch explains the difference between a good and a bad explanation:
“for, whenever it is easy to vary an explanation without changing its predictions, one could just as easily vary it to make different predictions if they were needed.”
In my previous article, “A Generative Model for Discovering the Unknown,” I discussed that difference between discriminative, descriptive, and generative models of the world. Generative models are the mode of explanation that is also used by Constructive Theory. Constructor Theory is defined in terms of an algebra of tasks. Where a task contains a pair of input and output states. A task performs transformations that are either possible or impossible. Constructor theory is a theory of transformations describing those that can be caused and those that cannot, and why. It’s a mode of explanation that is bottom-up and computational. Deutsch says in an interview that “we have to make the Turing principle fundamental in physics.”
This mode of explanation is not entirely new. This idea of the computational universe dates back to Konrad Zuse. Stephen Wolfram further refined the idea in “A New Kind of Science” where he explored the narrative of a kind of physics that assumes computation superseding more conventional mathematics for which physics conventionally explained. What Wolfram showed was that cellular automata revealed that complex Turing Universality exists in the simplest of machinery. Turing had previously shown that there exists a class of automation such that there exist limits to predictability (i.e. undecidability) of behavior. Constructor Theory is a generalization of computation. A constructor-theoretic capability that illustrates this is (see: Constructor Theory):
A possible task can be composed of impossible ones, as in the case of conservation laws…
The intuition that everything is computation is a strong one. Constructor Theory provides a more general framework for describing computation that expands understanding beyond classical computation. In fact, Constructor Theory is able to explain the concept of information rather than treat it as an axiom as in most every other theory. This forces us to ask the question “what kind of computation are intuitive machines like Deep Learning and biological cognition performing?” Are intuitive machines performing “non-classical” computation (different from both quantum and classical computation)? Can Constructor Theory provide a new perspective for explaining the emergent complexity?
What Deutsch and Marletto’s Constructor Theory does is that they bootstrap physics from principles such a clone-ability, interoperability and distinguishability. They employ information-centric principles and use this to reconstruct physics from scratch. As an example, quantum physics are information mediums wherein precise information copies are forbidden. From Deutsch earlier exploration on ‘what is a good explanation,’ he discovers an alternative way of explaining the reality that is more powerful than the conventional way. The conventional mode of explanation is “to predict what will happen from initial conditions and laws of motion.” Deutsch and Marletto argue that initial states are not fundamental. They have subsequently used this theory as a unifying mechanism to explain disparate topics such as quantum physics, probability, information, thermodynamics, and life.
Constructor Theory works at the level of principles, these are laws that constrain laws. An example of this kind of law in physics is the conservation of energy. This may indeed valuably inform a specific problem with generative models. I’ve discussed this in “The Delusion of Infinite Precision”. There I explored the possibility of employing Deep Learning to generate physical realistic phenomena. Deep Learning is astonishingly capable at mimicry (see: “Deep Learning solves the Uncanny Valley Problem”), the pragmatic question though in the sciences and engineering is whether this mimicry can be used to simulate real physics, chemistry or biology. Constructor Theory hints at a methodology that is driven by principles similar to the conservation of energy.
Coincidentally, Deutsch has argued against probability mode of explanation. In a lecture about “Physics without Probability” he says:
The world just is not probabilistic it’s an illusion and the probabilistic mode of explanation has about the same status as the Flat Earth theory. You might find it useful to use the predictions of the Flat Earth theory when you’re planning out your garden. But, when you’re thinking about what the world is like and even more when you’re thinking about what the laws of nature are, it would be hopeless, that the theory of the Flat Earth, would just be an impediment to understanding what’s out there. So same is true of probability.
I have made similar controversial remarks about probability being a crutch that is impeding our understanding of general intelligence (see: “Should probabilistic inference be thrown under the bus?” My argument was indeed much narrower in applicability that Deutsch. I was questioning whether this mode of explanation made sense for understanding general intelligence. I did not think however that this mode of explanation was also flawed in many other domains. I had never imagined that it was also a crutch in the field of quantum physics!
Deutsch recognizes that the use of probability as a mode of explanation is one that is extremely flawed and in fact has no basis in reality. Deutsch offers a decision-theoretic approach as a replacement for probabilistic reasoning. I entirely agree with this stance. The proper way to extract value from the ideas of probability is to interpret it as just “levels of support”. There are many interpretations of what probability is: frequency, propensity or measure of belief, a measure of certainty etc. The debates about different interpretations have been going on for a very long time. The key to making progress is through Judea Pearl’s causality calculus. Where we build explanations of the world through graphs that represent causality. From here, we can apply testability to our explanations. At the highest rung of Pearl’s “ladder of causation,” he places counterfactual criticism. Deutsch perhaps makes a similar argument for Constructor Theory:
Explanatory theories with such counterfactual implications are more fundamental than predictions of what will happen.
This difference between prevailing modes of explanation can be illustrated by Marletto in the domain of emergent phenomena:
Darwin’s Theory: The appearance of arbitrarily complex design can have come about through simple steps in the absence of an intentional design process.
This is different from the prevailing mode of explanation:
The appearance of design will, or will probably arise in the universe given certain initial conditions and the laws of motion.
The shift in understanding this difference in explanation is based on Turing’s discovery of undecidability in computation and also Godel’s proof of incompleteness for mathematical proofs. Prokopenko et. al., have a paper “Self-referential basis of undecidable dynamics”, that explores this relationship between computation, formal proofs, and dynamical systems. The paper investigates the importance of undecidability, universality, diagonalization, and self-reference in each of these formulations and arrives at several requirements for these systems. These are (1) the capability of expressing negation, (2) an infinite computational substrate, and (3) program-data duality.
The determination in CT of the impossible relies on the conceptualization of systems which are dependent on all three of these characteristics. This requirement for an infinite computational substrate is obviously not possible in the physical world. In fact, it is this observation as to why Deutsch makes the argument that all abstract proofs are dependent on physics. Negation is necessary for logic and also the expression about the impossible (i.e. what is not possible). Here, CT defines something fundamental about information. Information is the counterfactual properties of physical systems. The final requirement of program-data duality is in fact the most curious since it is less than obvious. However, it is important to notice that Propenko et. al.’s analysis of Turing undecideability and Godel incompleteness are dependent on very simple notions such as (1) information, (2) computation and (3) programs expressible as information.
Deep Learning learns a crude form of causality graph that is solely based on induction. This is where Deutsch is also extremely informative in that he points out the limits of induction. Induction is unable to replace erroneous explanations. In short, induction alone is missing critical components required for knowledge discovery. It’s basically missing what I would call “ingenuity”. This ingenuity is expressed in explanations that are counterfactual in nature and not inductive.
Daniel Dennett describes both Turing universality and evolution as “inversion of reasoning” (alternatively, competence without comprehension). This inversion also shares commonality with cognition. How does evolution lead to greater complexity from the pressures of natural selection, genetic drift, mutation and gene flow? How do the same evolutionary processes lead to human complete cognition? Constructor Theory requires that self-reproduction in designer-free contexts such as evolution and cognition requires discrete computational models. Which leads one to the inquiry, where are the discrete models in Deep Learning or in cognition (see: “Are Biological Brains made of Discrete Logic?”)
There is a relationship between the requirement of discrete computational models with the concept of emergence. I previously wrote about as to the emergence of higher complexity (see: “The Emergence of Modularity”). The gist of emergence is that there is an accumulation adjacent possible knowledge that serves as the stepping stones to the next level of capability. According to Constructor Theory, this apparently is only possible when knowledge is reliably cloneable and thus is of discrete form. Chiara Marletto writes in “Life without Design”:
it is a fundamental idea of constructor theory that any transformation that is not forbidden by the laws of physics can be achieved given the requisite knowledge. There is no third possibility: either the laws of physics forbid it, or it is achievable. This accounts for another aspect of the evolutionary story. Ever better constructors can be produced, without limit, given the relevant knowledge, instantiated in digital recipes.
The evolution of life, it’s intentional stance and therefore any cognition requires the existence of digital information. This is the revelation of Constructor Theory. In today’s information age, this conclusion is surprisingly intuitive. That is, anything with software and digital knowledge is also an intentional agent. The problem remains that fast and frugal reasoning is very different from fast and error-free reasoning. I’m not going to give GOFAI thinking a free pass here. Understanding the connection with these two disparate worlds is key (see: Bridging the Semantic Gap). To what extent is information clone-ability applicable to biological cognition? I would argue that a robust intentional stance requires algorithms that do well in the context of uncertainty as well as the existence of self-models. Perhaps Deutsch's decision-theoretic approach has something to lend in clarifying this.
Then there is the question of what drives complex multi-cellular life forms? This is analogous to the question of what drives complex cognitive behavior? This can be perhaps understood within the framework of self-play or constraint-closures. I’ve sketched this previously in “Artificial Life, Constraint-Closure and Deep Learning.”
I am currently working on a narrative of the emergence of cognition from the perspective of a purely information-centric approach. I’ve been calling this “Generative Modularity” where I write how this modularity through generative construction evolves towards greater complexity. I begin with the Big Bang and eventually work my way up to human cognition (see: Part I and Part II). Deep Learning inspired computation likely employs a kind of information that resides somewhere in between Constructor Theory’s Superinformation and classical information. Here’s my follow up essay “A New Theory of Information Emergence.”
It is uncontroversial that the human brain has capabilities that are, in some respects, far superior to those of all…aeon.co
"Once you have eliminated the impossible," the fictional detective Sherlock Holmes famously opined, "whatever remains…www.scientificamerican.com
David and I have been working on this together for the past three years, and we've been applying it to many different…www.edge.org
Living things have puzzled and challenged us since the dawn of our species. Even in the light of our modern scientific…aeon.co
One of the world’s leading theorists has a new theory of everything. It’s first result: a description of classical and…medium.com
For a long time Blake Pollard and I have been working on 'open' chemical reaction networks: that is, networks of…johncarlosbaez.wordpress.com