Deep Learning is Non-Equilibrium Information Dynamics


There are basically several camps studying neural like systems. There are the folks who insist on a biologically inspired approach. These include firms like Numenta, Vicarious and researchers in the Connectome field. The other camp consists of people of the Bayesian religion. People who believe that some theorem, that was invented in the 18th century, would be the key to unlock our understanding of intelligence. There are also the alchemists who don’t really care about theory and are more than happy to conjure out the latest Residual or Attention model. If the results show “state-of-the-art” then that concoction must be the right approach.

The present reality of Deep Learning research is that the alchemists are winning and it’s not even a close contest!

Why is this so? Why are we at such a poor state of comprehension of Deep Learning? Could it be that the biological theorists or the Bayesian zealots are using the wrong toolbox?

One major shortcoming of our present day mathematical toolbox is that it is relevant only in conditions that are in equilibrium.

Unfortunately, the conditions for learning do not happen in an equilibrium state. Rather they happen at a state of non-equilibrium. It is like trying to take measurements after the fact rather than when it is happening. To measure only when a system is in equilibrium (or assume the central limit theorem) is to make observations only after the entire play is over. To understand Deep Learning, one needs to have a grasp as what happens in non-equilibrium at the transition between order and chaos.

Deep Learning are not biological systems nor are they physical systems. Many researchers derive their intuition from either contexts. However if you have grounded yourself in Newton’s classical mechanics, then the likelihood of you ever discovering Quantum mechanics is next to nil. Unless, you take a close look at the experimental data and realize that your world view is actually flawed. Deep Learning are information systems, not biological and not physical and therefore should be studied as such. That’s why the understanding the dynamics of information is of high importance.

Information systems (alternatively computational systems) consist of 3 fundamental capabilities. These are:

Information storage — Memory

Information transfer — Signaling

Information modification — Computation

It is that simple. The Cellular Automata Rule 110 that I describe in a previous post has all 3 of these capabilities. Universal Machines emerges from these 3 operators.

Now you may be asking yourself that it can’t be this simple! The notion that a complex system requires complex constituents is an entirely false assumption. The key to understanding the capabilities of complex systems in in the 3 operators. In fact, in my previous post about “5 Capability Level of Deep Learning Intelligence”, the levels are just different combinations of these 3 operators and of different levels of sophistication.

Deep Learning systems are of course much more capable that being able to perform universal computation. They are capable of not only learning, but also meta-learning. The two core computational (information modification), capabilities are matching and selection. Deep Learning systems consists of ensembles of self-similar matching and selection units. They consist of multiple layers of this and are routed via signaling (information transfer). To make an analogy with another AI technique, its just like a swarm of simple matching and selection machines.

The key question however is how do these systems learn? This is a complex research subject, but we certainly know one thing, these systems aren’t learning when they are in equilibrium. In fact if we study biological systems, we know that in the non-equilibrium state that the evolution of a system tends towards minimizing relative entropy. That is, the same optimization direction of minimizing the KL divergence (i.e. a measure of difference between two distributions). Furthermore, we know that phase transitions near high mutual information in models. This implies that all too convenient assumption of i.i.d. needs to be thrown in the dustbin. The study of DL must be in the regime of non-equilibrium states and not in the mathematically convenient regime of equilibrium.

One final thought, you may be also wondering if physics can be captured in a information dynamics (aka computational mechanics) framework. There actually have been several papers that cover that area, specifically in information theoretic terms. This is possibly where that entire notion of reality being in a simulation comes about. One of those topics that I, like Elon Musk, would also like to avoid!

BTW, the image above is an image of the surface of a liquid in a non-equilibrium state. What does it remind of us that we find in biology?

Continue the conversation at

Related: Deep Unsupervised Learning using Nonequilibrium Thermodynamics — Provides a sample technique inspired by non-equilibrium statistical mechanics. Towards biologically plausible 
deep learning Equilibrium free energy differences from nonequilibrium measurements: a master equation approach Extremal principles in non-equilibrium thermodynamics Transfer entropy in continuous time, with applications to jump and neural spiking processes The Complexity of Simplicity

Additional Commentary

It occurs to me that many readers, with an interest in AI, don’t seem don’t seem to understand how mathematics is used to model reality. Math doesn’t model the world, you fit math so that it looks like the world. It is the same idea as curve fitting, you hypothesize that a certain formula fits with the world and if it does then you are lucky.

So as a matter of convenience though, the math formulas that are easy to work with are the ones that are used. Furthermore, because of the limitation of mathematics, simplified systems are used for analysis. The universe doesn’t have a requirement that a closed form equation exists to model its behavior.

Thermodynamics equations are based on empirical observations in that unlike other branches of physics, are not derived from first principles. They are about systems in equilibrium and the variables are aggregate measures of a system. Statistical mechanics is a branch of physics that has techniques to study behavior of large collections of interacting particles. If you think it uses statistics because of its name then that’s also a misconception. Under Statistical Mechanics there is Non-Equilbrium Statistical Mechanics which studies systems outside of equilibrium. This is the regime where Nobel prize winner Prigogine did his work. When you get into this ‘regime’ then that’s where you biological processes and physics meet.

DL systems however are not biological systems and DL systems are also not physical systems. So the closest thing that can model its behavior and have the properties similar to biology is Information Dynamics in the state of Non-Equilibrium.

Explore more in this new book:

Pre-release: Artificial Intuition: The Deep Learning Revolution