“the miraculous fact on which the rest of AI stands is that when you have a circuit and you impose constraints on your circuits using data, you can find a way to satisfy these constraints by iteratively making small changes to the base of your neural network until its predictions satisfy the data” — Ilya Sutskever
Artificial Life is one of those long-running research endeavors with no obvious purpose but with extremely intriguing results. The latest is Bert Chan’s exploration with continuous cellular automata:
I’ve mentioned artificial life before in the “Quest for AGI.” What I find critically important about these cellular automata systems is that it is only by chance that new kinds of stable, persistent and non-stationary structures are discovered. This is true even despite the simplicity of these cellular automata. These systems, that have internal feedback loops and non-equilibrium, appear to be opaque to conventional analysis (BTW, we already know of the intractability of non-equilibrium dynamics). Yet, how we can arrive at automation with that are dynamic, persistent and even adaptable are crucial in our understanding of life and ultimately cognition.
One can think of cognition as being a massively parallel constraint satisfaction problem. In fact, both learning and inference are both constraint satisfaction problems (see: Machine Learning: A Constraint-Based Approach). An abstract schema of this is perspective is depicted below:
Where we have an agent model (i.e. the self) observing its environment (i.e. context) and selecting decision to achieve its objectives (i.e. goals). It’s a continuous constraint satisfaction problem where the agent is regulating its actions with respect to its environment. Meaning, awareness, and intentions are all internal mental constructs that an agent employs to solve the constraint problem. Effectively, our biological brains have evolved to derive a constraint model out of its perceived context. Performing constraint satisfaction requires a model of constraints.
Previously, I wrote about the ‘strange loop’ in deep learning. This is a curious recurring pattern that we find in deep learning architectures. Douglas Hofstadter coined the term in his book “I am a Strange Loop”:
In the end, we are self-perceiving, self-inventing, locked-in mirages that are little miracles of self-reference.
“Ladder Networks.” Ladder Networks were introduced way back in July 2015. “Tagger: Deep Unsupervised Perceptual Grouping” have layers that loop back into itself. Generative Adversarial Network (GAN) also has its own loop, not explicit in its architecture but rather as part of its training. A CycleGAN is a specific kind of GAN with an explicit loop:
The crux of CycleGAN method is the use of a ‘cycle-consistency loss.’ This loss ensures that the network can perform the forward translation and the reverse translations with minimal loss. Thus, the network must learn how to not only translate the original image, it needs to also learn the inverse (or reverse) translation. The CycleGAN leverages the loss function (i.e. objective function) to ensure a loop. That is a cycle-consistency constraint is enforced.
The key innovation in cyclic GAN is that these are systems that learn in an unsupervised manner (i.e. without explicit human created labels). It’s almost like a cognitive perpetual motion machine. Automation appears to be able to dream up new variations of labeled data. As a consequence, non-intuitively bootstrapping themselves with more data and greater complexity. These networks play a kind of simulation games with themselves, and with enough gameplay, become experts in the gameplay. Bootstrapping is an impossibility in the physical world, but that doesn’t mean it is not possible in a virtual world.
It is analogous to how AlphaGo is able to develop new Go strategies by doing self-play against itself. When automation is embedded with a feedback loop and is able to simulate (some would call this ‘imagination’) many different scenarios and self-test those scenarios for correctness then we are at the cusp of some extremely potent technology.
I wrote earlier about how evolution can seemly also bootstrap itself using an iterative loop of successive modularizations:
The evolutionary cycle is sustained similar to successively increasing gameplay difficulty. In the context of learning, the evolutionary gameplay is between predictive machines and their environment (that also includes other predictive machines). Evolution is like a self-play process where the agents and its environment are in an arms race towards greater capabilities.
Stuart Kauffman would call this “radical emergence”. Kauffman’s definition of emergence is one that I find very convincing. Radical emergence is when a new innovation comes into existence when new future possibilities emerge that did not exist before. We see this all the time in technology, for example, websites wasn’t a future possibility prior to the invention of the World Wide Web.
The bootstrapping method for CyclicGANs, AlphaGo, and evolution are all examples of radical emergence. These are systems progressively build up sophistication by incrementally satisfying constraints. Each new solution progressively opens up new future possibilities. So in the AlphaGo self-play scenario, the game playing system progressively becomes better in Go gameplay. It incrementally discovers new strategies that did not exist when it was just learning to play the game of Go. Knowledge just has an uncanny habit of building itself up from other knowledge. This is intuitively true, why it is true is however not apparent to me! Radical emergence is the same process in deep learning, the brain, evolution, and technology innovation.
Constraint Closure is another idea that Stuart Kauffman is exploring as a means at building a better understanding of radical emergence. We begin here with first asking ourselves as to what is the nature of work? Physicists described work as a force required to move an object in a specific distance. A more general definition is the energy expended to direct an object in a constrained direction. To do any work (the physics definition), a constraint is applied on a process that creates another yet another constraint.
A constraint closure is a set of non-equilibrium processes that form a loop.
A biological process is a work process (unless it violates the laws of physics). Such a process takes as its inputs the environmental stimuli and constraints creating to create its work products. These products become output constraints that serve as an input to a subsequent non-equilibrium process. A loop is formed when a final constraint is used as an input to an initial process.
This is a good abstract framework to use in defining the ‘strange loops’ we find in Deep Learning. It is not enough to recognize loops. What is needed is to recognized loops that are stitched together using constraints. Constraint closures work by modifying their environment to create artifacts that are required to sustain their cycle.
The difference between the CycleGAN and the Constraint Closure is that the CycleGAN learns to create intermediate latent models rather than constraints. To get cognitive processes that are equivalent to biological processes, we map latent models instead into constraints. A recent paper “Deep Generative Models with Learnable Knowledge Constraints” explores such an idea of generating learnable constraints as inputs and outputs or a neural network:
The above research is a first step in thinking of representations as being equivalent to constraints.
Here’s an even more impressive system that combines a source image and poses to generate a complete rendering:
Constraints in a conventional neural network are represented by either the architecture of a network (ex: a convolutional network is invariant to translation) and the regularization that make up an objective function. One can take the perspective that the input features of a training set are also constraints. Data that is part of the training set is constrained to its intrinsic semantics. Training data is not random data, it is in fact constrained data. Constraint satisfaction in learning involves training data, learning curriculum, network architecture, and objective function. At a meta meta model level, all of these distinct structures and algorithms can be treated as neural networks.
Treating representations as constraints are in fact not a radical idea. In programming, data can also be treated as code. Lisp is an example of a language where code and data are one and the same. Code (or a program) is the same thing as a constraint. There are languages that have declarative code and there are languages that have imperative code. So treating, neural network internal representations is conceptually identical to treating data as code. In fact, there should be no reason why nature would even make the distinction. Distinguishing data from code is a convenience heuristic that is applied for the benefit of humans. Therefore, there should be no reason why a latent space in deep learning should be treated solely as data (i.e. a manifold) and not code (some discrete representation that encodes another representation).
From this perspective, a network’s latent spaces are in fact constrained code spaces that are learned. Variational Autoencoders and Beta-VAE are constrained latent spaces that are meant to conform to a Gaussian distribution or a few important dimensions. There’s no natural law that prevents these latent spaces to be treated as code spaces.
The goal to create a persistent, non-stationary and adaptable system is to create a Constraint Closure such that takes outputs of its internal processes and is feedback into itself. It’s just like an engine that works in a cycle. This is, however, a cognitive process, so it is a cycle that is virtual and not physical in nature. Constraints are in the most abstract sense what stitches this loop together. So what we have is a constraint satisfaction problem that folds back into itself through the use of constraints.
It is, however, insufficient to have a system that only strives towards equilibrium. Rather, equilibrium needs to be disrupted to arrive at greater sophistication. In AlphaZero self-play, there is a selective process that selects the strongest players, but this doesn’t break the symmetry of an equilibrium state. That is when all the players are all equally good. However, just as in evolution, a sufficiently valuable mutation leads to an accidental leveling up in capability. I suspect there are additional ‘leveling up’ or ‘symmetry breaking’ mechanisms that exist beyond random mutation.
The constraints in cellular automata are the same as their current state. Cellular automata by construction use constraints as both their input and output. Unfortunately, as in Conway’s Game of Life, the discovery of more complex persistent artificial life is (given our current limited understanding) a matter of sheer luck.