A New Theory of Information Emergence
The Constructor Theory of Information (Deutsch and Marletto) is one of the more compelling explanations of the nature of information that I’ve encountered. Conventionally, information and computation are lumped together. Both are treated as abstractions that are instantiated in physical form through technologies like computers and perhaps any intentional agent such as living things. However, before Construction Theory (CT), the origin of information was rarely explored in a rigorous and formal manner.
CT provides a unified framework for both classical and quantum information. For purposes of this discussion, it will be sufficient to confine me to discussing classical information. Constructor Theory argues that information is not a mathematical, logical or abstract concept but rather surprisingly something that can be derived from the laws of physics. Just as conversation laws in Physics apply to different physical substrates, information has the same ‘substrate independence.’
Information has the property that it can be cloned from one substrate into another. Furthermore, information is necessary for preparation and measurement. These are tasks that convert information between the abstract and the physical. That is, preparation converts information from the abstract to the physical. Measurement converts the physical to information in the abstract.
Shannon’s theory of information is inadequate in that (1) it does not describe information that is in quantum physics and (2) it is about information that is in distinguished states but overlooks the need to specify what distinguishing consists of in the physical.
Constructor Theory (CT) consists of an algebra over tasks. Tasks are capable of transforming input attributes into output attributes. The theory provides a definition of a possible and impossible task (i.e., related perhaps to Turing undecidability). CT is a superset of the classical theory of computation. Unlike classic computation where the definitions of information and computation are circularly defined, CT breaks that cycle and defines information from computation (i.e., the Task algebra).
Measurement and Preparation
Classical information is essentially a clonable computation variable. CT further defines the meaning of distinguishability (an object cannot carry information unless it can be in different states), defined by possible tasks capable of transforming attributes in a substrate into a cloneable information variable. CT defines interoperability as possible tasks that combine information from two information substrates. An input variable is defined as measurable when there is a possible task that transforms input from an original substrate to an information substrate. Preparation is the reverse, in that it there is a possible task that transforms information and generates a physical instantiation in a physical substrate.
Entropy is a measure of information that finds its way into the vocabulary of macroscopic phenomena and in information communication capacity. Both Boltzmann and Shannon (years later) defined the use of entropy (a measure of a kind of information) in different fields but with similar equations. Boltzmann in his pioneering work in Statistical Mechanics defined entropy as a measure of statistical disorder. Shannon was unaware of the similarity of his measure of information with that of Boltzmann’s thermodynamic entropy. However, when searching for a name for his measure, Von Neumann recommended to Shannon that “nobody knows what entropy is, so in a debate you will always have the advantage.” Thus these two separate ideas of measures of information were eternally linked.
CT has its generalization of thermodynamic entropy. The second law of thermodynamics is conventionally described with respect to the direction of change in entropy (i.e. increasing). In contrast, CT derives the second law from a notion of ‘adiabatic possibility.’ Where the interoperability of heat and work media inform the direction of change. Furthermore, CT disentangles the notion of the second law and the arrow of time. Therefore, CT provides a more fundamental explanation of the second law that does not rely on a measure that ‘nobody knows.’
CT is a bit formal in its definition of information that we may intuitive. However, the formality allows us to transcend the usual vagueness commonly found in defining information. CT has made notable progress in exploring related areas such as Probability, Thermodynamics and the requirements for Life. I will skip discussion of the two former concepts and head straight to the CT for life. CT concludes that life is dependent on digital information (i.e., DNA) to support self-production, replication and natural selection.
CT has a formal definition of knowledge. It is information that acts as a constructor and causes itself to remain instantiated in physical substrates. So replicators found in life must contain all the knowledge about how to construct and sustain themselves. CT argues that the ‘no-design’ laws emerge out of accurate constructors. In life, this requires replication and self-reproduction. This requires the existence of replicators-vehicle logic first speculated by Von Neumann. There need to be mechanisms for high accurate cloning (i.e., replication) and a complementary mechanism required for survival in challenging environments (i.e., modularity). We shall see this idea play out repeatedly in higher levels of complexity.
Stuart Kauffman’s concept of the adjacent possible is an intuitive explanation of emergence. Emergence I define as complex systems where emergent behavior of a collective system can’t be derived from the behavior of its parts. However, it does not give us insight into why there is this seemingly constant force towards the evolution of gratuitously complex or extravagant systems. We can argue that biological evolution has no bias towards complex systems, yet we do see these extravagant organisms arise. Gratuitously complex organisms require more resources to organize their own complexity as compared to simpler organisms. For example, a cockroach has a higher likelihood of surviving a nuclear holocaust than a human. What then is that driving force that favors the emergence of excessive extravagance?
CT treatment of thermodynamics introduces a notion of irreversible processes. More specifically, a process is irreversible when there is a task that transforms x -> y, but the reverse task of y -> x is impossible. Now, let’s consider the task that prepares the creation of a constructor. This kind of generative task is likely to be irreversible. That is, there is no evolutionary motivation to discover a task that takes a constructor and derives its construction recipe. Let’s then assume that there is a contingent modification of the original construction recipe that is either inert or necessary for fitness.
Let’s then consider that this contingent modification is caused by another preparation task. Let’s say the self-correcting mechanisms of the constructor ignores the newly introduced dependency. Furthermore, all new task dependencies are preserved are replicated to descendants. The consequence of this is an ever-growing bureaucracy of tasks is replicated that substitutes for the original constructor. The complexity of this bureaucracy increases to a level that it becomes irreversible to return to a simpler constructor. It is like haphazard software development where new features are incorporated without ever managing technical debt. Constructor theory requires that there is a way to repeatedly recreate the Constructor. There is no defintion of Constructor other than the transformation task that it performs. Therefore, contingent modifications on an existing Constructor, is the force that drives for greater complexity. This is how CT can explain the evolution of complexity.
CT does not cover in its exploration of life is that characteristic of living things (or agents) to have what is known as an intentional stance. Intentional stance is a term coined by Daniel Dennett to refer to agents with behavior that are due to having cognitive capabilities. I use the term in the broadest of senses, that also includes the most primitive of cognitive capabilities (i.e., stimulus-response). Physics has the law of conservation of energy, which is basically an invariant property with respect to time. Analogously, biological agents have the intentional stance of preserving self or the conservation of self (aka survival and replication). Thus, information about self (see: AGI using self-models) is essential for all biological life. Self is a Constructor, and the essence of life is the preservation and replication of this Constructor.
Preparation and measurement characterize all that is communication in correspondence to Shannon’s framework. However, when we begin to introduce agents with an intentional stance, we find a new kind of information. This is information aboutness. It goes beyond how information is codified and into the semantics of information. We know this as icons in semiotics and references or pointers in computation. Information aboutness is of value only for intentional agents with memory. These are signs that capture similarity of what has been observed. Micmicry is an example of a procedural form of information aboutness. Information aboutness makes possible the recall of information that previously was encoded in memory. It is imporant to note that the kind of recall from memory by biological systems doesn’t have the same kind of fidelity and precision as found in computers. Rather, when we see what information is about, we relive a fuzzy rendition of the original experience that is in memory.
Intentional agents learn the aboutness of information through environmental interaction. That is, learning and model building is achieved through intervention with the world. The expectations that an organism acquires is achieved by testing conjectures and predictions. Biological organisms known as the efference copy or efferent copy. This is information about an organism’s own movement. This information explains why we can’t tickle ourselves or why we rub ourselves when we get hurt. Information about our own movements reduces the sensitivity of our sensors that are caused by our own actions. It allows us to maintain the stability of what we see despite the movement of our heads and eyes. This is information aboutness is what is known as symbolic grounding, a missing feature of GOFAI symbolic approaches.
The next tier of information is what Gibson would describe as ecological affordances and what semiotics would describe as indexical signs. The value of this information is that it conveys to an intentional agent the possibilities and impossibilities that are available in a context. Constructor Theory revolves around the identification of possibilities and impossibilities of transformational tasks. In the realm of intentional agents, the recognition of possible and impossible potential actions is ideal for the preservation of self (i.e., survival). To learn about affordances requires the development and query of internal mental models of the world. This allows an organism to “see” what is possible or impossible without actual interaction with the world. Douglas Hofstadter has proposed that analogies are the fuel and fire of thinking. Information affordances are what makes analogies possibles. These are indexical information that binds concepts together and permits the combination of new concepts. This is what I would call “ingenuity” and it is entirely lacking in current Deep Learning models.
Let’s now delve into deeper into the concept of information aboutness and usefulness with respect to self-models. That is, what can we say about information about the self that is referential and ultimately useful? Self-referential information involves higher information abstractions such as autonomy, introspection, and reflection. Brian Cantwell-Smith describes these kinds of information are based on notions of self as unity, self as a complicated system and self as an independent agent. Autonomy is information on the self that recognizes its self-direction and agency. Introspection is observations of a self’s cognitive processes. It allows reasoning about our own thoughts. Reflection is a detached perspective of a self’s cognitive process and reasoning from the perspective beyond the self. There is ever-increasing complexity with different kinds of information required in the self-referential exploration of self.
But we are not yet done with our emerging ontology of information. No man is an island, and no organism is independent of its ecology. In the Pi-calculus of distributed computation, information aboutness is known as a channel or vocative name. The Pi-calculus employs aboutness information as information coordination. Human civilization employs information coordination as a means of resource allocation. Money is an example of this kind of information. Money is essentially information about obligations and ultimately related to trust. But what is trust from the perspective of information? Trust is what I would fall into the same category known as information usefulness. However, trust is information about the self that is conveyed in interaction and communication with other selves.
One problem with Bayesian approaches is that it is unable to express “known unknowns”. I other words, one can not express “I do not know”. This is problematic in that expression of this is necessary for rational decision making and also knowledge discovery. Itzhak Gilboa writes “The Bayesian approach is good at representing knowledge, poor at representing ignorance.” A theory of information must be able to express different states of knowledge:
Furthermore, a theory of information must have something to say as to how new knowledge is to be acquired.
There is an asymmetry with knowledge that is obvious but often opaque to most theories of information. This is what I would call the asymmetry of information discovery. It is easy to discover what is already known. You can always query your memory or query human knowledge. It is unknown how long it will take to discover what is unknown. Querying reality for new information is an open-ended task. The asymmetric cost for discovery clearly is fundamental. How can CT lend formality to this fundamental characteristic of all information? The cost to look up old information (aboutness or usefulness) is less than the cost to derive new corresponding information.
Furthermore, the cost to discover the possibilities of a task is greater than the cost to discover the impossibilities. A constructor can certainly be constructed to query known information, but is there a constructor for discovering new information? The latter is appearing to be impossible under the definitions of CT, that is assuming there does not exist a replicable process of knowledge discovery. I suspect there is an opportunity here to better formalize knowledge discovery using the tools in CT.
What is missing in CT formulation are details about construction of constructors. Tasks in CT can be composed within networks of tasks, but this is subject to a constraint that there are no loops. The loops in CT are constructors, these are the engines of transformation. Constructors are defined as these objects that can repeatedly perform transformations subject to a certain level of accuracy. One can perhaps describe CT formulations as a recipe for identifying these loops. Loops, however, are interesting in their importance in amortizing inference.
I’ve hinted about this observation of strange loops in deep learning. Essentially, GANs, with its play between discriminative and generative networks, and self-play Reinforcement Learning are embodiments of constructors in CT. One could furthermore classify discriminative networks as information measurement tasks and generative networks as information preparation tasks. To elucidate this even further, one can apply the CT’s thermodynamic treatment to the stochastic gradient descent learning process. Deustsch has proposed that humans and computers are both universal constructors, his definition is that these are constructors that can transform matter into other kinds of matter. I would however like to have a definition that is more specific to the kinds of cognition that may be labeled as constructors.
Conversational Shared Experience
Sharing information context is an essential component for human cognitive development. Human eyes, specifically the white of our eyes, allow others to recognize what we are attending to. Human eyes have also evolved to understand the subtle changes in the color of our faces. These evolved capabilities reveal the importance of shared contextual information. A human’s ability to share their own experiences even without verbal language is an essential tool that accelerates cognitive development. One can even make the general assertion that the essence of being human is in the activity of sharing human experience. One can therefore not comprehend human-compete intelligence without having a level of understanding of human experiential sharing. In art, there is a concept of the “beholder’s share.” That is, beauty is in the eye of the beholder, what is meant is that an interpretation of art is performed by its perceiver. However, good art is the kind of art where the artist is able to share an experience with its beholder. Da Vinci’s Mona Lisa’s smile, as an example, is sufficiently ambiguous such that it can morph to the preference of the beholder.
I’ve written several times that to achieve AGI, that achieving conversational cognition is required. In my capability model, conversational cognition is at the highest steps. Conversational cognition makes possible cultural evolution. The notion in psychology of dual-hereditary theories, proposes that human cognitive deveopment is both biological as well as cultural. A recent position paper from DeepMind addresses specifically this perspective in“Emergence of Innovation from Social Interaction”. What’s interesting though is that humans have been able to converse for thousands of years with barely any technological progress. That is, the same tendency for bureaucratic organization exits also in the development of human society and cognition.
The scaling of shared human experience is what Richard Dawkins described as memes. Memes are the basis of an evolutionary model for cultural information transfer. Analogous to a gene, the meme was conceived as a “unit of culture” (an idea, belief, pattern of behavior, etc.). Memes are “hosted” in the minds of many individuals, and reproduce themselves by transferring from the mind of a person to the mind of another. It can be regarded as an idea-replicator that replicates itself by influencing the adoption of new beliefs by many individuals. Analogous to genetics, the success of a meme is dependent on the ubiquitous use and replication of its host. Daniel Dennett proposes that the development of human languages is a consequence of the spreading of memes.
Memes, analogous to biological cells and viruses, have a two-level structure(i.e. A replicator-vehicle logic). Douglas Hofstadter in Metamagical Themas describes the structure of an effective meme:
X1: Anyone who does not believe System X will burn in hell,
X2: It is your duty to save others from suffering.
If you believed in System X, you would attempt to save others from hell by convincing them that System X is true. Thus System X has an implicit `hook’ that follows from its two explicit sentences, and so System X is a self- replicating idea system.
There is a bait that conceals the hook that allows a meme to propagate. Let’s frame this structure using the replicator-vehicle logic in CT. The vehicle, unlike in biology, is not physical but somewhat abstract. If we are to understand a vehicle as the mechanism that ensures sustainability, then it is the hook that is relevant here. The hook is the deception of information usefulness in the meme that encourages its own propagation. The concept of “unknown knowns” or willful ignorance has utility in the propagation of memes (and thus knowledge). It is indeed interesting that deception is a characteristic of information that originates not just in memes but in more primitive biological organisms. Camouflage and viruses are examples of deceptive strategies in biology. The physical replicator here would be the host mind that accepts the truthfulness and thus the utility of the original meme.
In 1999, Carl Shapiro and Hal Varian wrote Information Rules that described many of the emerging characteristics of the information economy. This included ideas about pricing, versioning, rights-management, lock-in, network effects, and compatibility standards. These are all emergent kinds of information that are a consequence of the dispersion of information as well as the development of information modularity.
CT explains the ubiquity of modularity in biology. The Eukaryotic cell is constructed in a modular fashion with many subcomponents (example: nucleus, mitochondria, ribosomes, etc.,). Similarly, multicellular organisms have differentiated modules that serve different functions (i.e., lungs, hearts, brains, etc.). The same modularization found in biology is also mirrored in information. In Shapiro and Varian’s Information Rules, many of the characteristics of the information economy is a consequence of modularity. Modularity implies information partitioning and encapsulation. That is, information is encapsulated in a manner that regulates its interaction with other information in an ecosystem. To illustrate this, music is protected by copyright law to restrict arbitrary copying. That is, the characteristics of physical things are being regulated by law to become the characteristics of information.
However, modularity by fiat is not the main driver of information modularity. Rather, we see that the value of information modularity is its usefulness in the control of information complexity. Information modularity is a consequence of the limits of human cognition. There is no intrinsic force that requires information to be organized and categorized. However, from the human perspective, this is necessary. In fact, in certain domains like software development, information modularity is not only essential for containing the complexity but is for accelerating the recombination of components to build new capabilities. Modularity allows frictionless composition and extension of information that drives innovation.
Meta Level Theories
Humanity has invented many kinds of modes of explanations or meta-level theories. Physics theories are formulated from initial conditions and the laws of motion. Incidentally, you can derive the laws of motion from a higher principle known as the Principle of Least Action. Bayesian probabilistic reasoning is another meta-level theory. The Bayesian main flaw is that it frames a theory of subjective inference that purports to explain everything but actually explains nothing. The problem is that formulating a Bayesian prior is analogous to formulating a homunculus explanation. Deutsch in his lecture about “Physics without Probability” mentions several modes of scientific explanation:
… via initial conditions and laws of motion
… via prediction of probabilities
… via principles of nature (e.g. general covariance, unitarity)
… via emergent laws (e.g. laws 0,2,3 of thermodynamics)
… via variation and selection (evolution principle)
… via anthropic selection
… via human creativity (and intentions, etc.)
Constructor Theory is a meta-level theory. It reminds me of Category Theory. Category Theory is a formalization of mathematics in terms of a directed graph called a category that contains nodes (known as objects), edges (known as morphisms or maps) and the composition between morphisms. It has been used to formalize other abstract math such as sets, rings, and groups. Category theory informally is a general theory that studies these morphisms or maps. These morphisms are ‘structure preserving’ transformations between objects in the category.
Category Theory shares the similarity in its focus on transformations between objects. CT calls these morphisms as tasks, and these tasks are characterized as possible or impossible tasks. The definition of possibility takes the notion of Turing computability. Thus it describes systems in what is possible (curious enough, the Halting problem will tell you that this is also not possible!) instead of describe what a system can do. What a system actually does is an emergent property. I suspect that there is potential in applying the tools found in Category Theory within the framework of Constructor Theory. The obvious difference is in how morphisms are treated, that is, not all morphisms area possible. This synergy could lead to richer theories of information.
When one encounters Category Theory or Constructor Theory, one can fail to recognize the meta-level nature of these approaches. These two approaches explore higher level conceptual structures that appear to be uninformative of the lower level conceptual structures that they are about. For example, you can have a linear algebra category that includes objects such as matrices, but it will not tell you how matrix multiplication is performed! Similarly, you can have a possible task in Constructor Theory, and it won’t tell you the details of the transformations performed by the task. There are no imperative recipes with these meta-level theories, only high-level principles. That is, in the case of CT, laws about laws (known as principles). If it was not strange enough, this might be the proper level of inquiry for any complex system. This is because we want to be able to reason about complex systems using vocabularies that can express emergence. Almost all modes of explanations in science cannot explain emergence.
What is the fundamental cause of the creation of advanced human technology like the LED screen that you are presently using to read this text? Technology is a consequence of science. Science is a way of thinking that finds its origins in the Enlightenment. This is a period in human history where humans began to break the inefficiencies of thought that emergence from information bureaucracy. This is when the usefulness of information changed from being based on authority to being based on testable evidence. This phase transition that introduces a mechanism to revisit existing human knowledge and to clean it out from gratuitous and extravagant complexity is the driver of exponential technological change.
While current deep learning systems excel at tasks such as object classification, language processing, and gameplay…arxiv.org
In this position paper, we describe our vision of the future of machine programming through a categorical examination…arxiv.org
The New York Times bestseller: A provocative, imaginative exploration of the nature and progress of knowledge…www.amazon.com
Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push…arxiv.org