Even before it started, 2019 was announced as the year, in which AI applications would rise to the surface to disrupt industry and academic research. Halfway 2019, an article with a rather blaring title appeared. It reported on a remarkable result when applying a neural learning model. The neural net in question had not just reproduced the evolution of the Universe from its early beginnings but it did so in a matter of seconds much faster than traditional simulation approaches, which might take up to hundreds of hours. It plotted nearly faultlessly the emergence of matter over time as gravitational densities and their displacement in an imaginary sector of the Universe that stretches across hundreds of lightyears. As the researchers shared with the article’s author: “Nobody knows how it does this, and it’s a great mystery to be solved.”
But that’s not all, having trained their deep neural network with the output of 8000 accurate simulations — produced by established methods, the researchers were utterly surprised to discover that it could project future states correctly even for other cosmological parameters than it was trained for. Somehow, it had identified something fundamental in the way the Universe — and, I might add, reality — unfolds. This appeared to have dazzled the researchers: “Why [does] this model extrapolate so well, why [does] it extrapolate to elephants instead of just recognizing cats and dogs.”
In this essay, I’ll explore an answer to these questions. I’ll build on a way of seeing reality that I introduced earlier in a book. But here, I’ll rely even more on a phenomenon that fascinated the physicist, Richard Feynman, since high school and influenced his research later in life.
What the researchers brought to light
In “Learning to predict the cosmological structure formation”, a down-to-Earth logbook of steps taken, the researchers solidly state their case. In line with cosmological observations, they started by assuming the Universe to be homogenous and isotropic, that is, sharing the same properties in all directions. This seemed relevant to my exploration because it comes with the promise that my insight might apply broadly. The following observations put me on a path that led to a logic that could explain the behavior of the neural learning model discussed by the researchers.
Not focused on things
So far, “N-body simulation” was typically used to produce accurate projections of the structure of the Universe. These kinds of simulations predict the motion of millions of individual particles and this necessarily involves calculating the effect of multiple particle-particle interactions. As one can imagine, this is a laborious task and takes even computers many hours to complete. To speed up the process, at the cost of a bearable percentage of errors, mathematicians developed shortcuts “to approximate” N-body simulations. The researchers used such improved approaches and compared their output with what their deep neural network produced. Here is what struck me. The improved approaches, however ingenious, all did their work by focusing on the job like we, humans, do it. They all saw the structure of the Universe as an advancing integration, if not assembly of interacting individual particles. The deep neural network appeared to be looking at something else.
Trained on displacement
As the researchers write in their paper, an evolving distribution of particles is often pictured by traditional simulation approaches as either gravitational densities (density fields) or their displacements (displacement fields). In the end, the researchers decided to train their deep neural network with displacement fields because, when feeding it with density fields, it failed to produce results that were comparable to the results of traditional approaches. Clearly, a density field says something about local qualities while a displacement field hints at motion- or path-related qualities. This might explain why, to paraphrase the researchers, different displacement fields can produce identical density fields when displacements become a bit chaotic. Relevant to the logic of cause and effect that I intended to follow, this was noteworthy.
Searching for higher order parallels
As I explore elsewhere, we, humans, typically try to distinguish differences. We learn to spot differences already in kindergarten and, as we grow up, we classify differences to create ‘science’. Neural networks, on the other hand, have no idea of differences. They distinguish matters by searching for parallels. “For example, to identify a cat as a cat, they munch through a multitude of cat photos to record the parallels, eventually to arrive at a set of parallels that these cat photos have in common.” For their experiment, the researchers adapted the architecture of a neural learning model that was originally developed for biomedical image segmentation, for example, to identify the growth of certain cells under a microscope. After their adapted model had searched for parallels in the output of 8000 accurate simulations for certain cosmological conditions, it was able to predict the future state of an imaginary universe almost flawlessly. The big surprise was that it was also able to achieve useful predictions for cosmological conditions other than that it was trained for. This suggests that the neural learning model, convolutional as it was, did not just identify parallels but also parallels of parallels that might be ‘innate’ to how the Universe (or reality) unfolds.
Knack for non-linearity
As the physical chemist, Ilya Prigogine, demonstrated when studying the transformation and exchange of energy, non-linear or chaotic behavior is a precursor of “order”. When a thin layer of liquid is heated from underneath, the behavior of molecules in the liquid becomes chaotic at first: the nonlinear regime. Soon after, molecules start following orderly paths to transport heat from the bottom to the surface more efficiently. These paths are visible at the surface of the liquid as a honeycomb-like pattern of cells, so-called Bénard cells or convection cells. This odd phenomenon is crucial to our existence. It explains the granules on the surface of the sun, the 20-odd tectonic plates on Earth, and the high- and low-pressure weather patterns that we experience each day, to name just a few examples. In the light of this, it seemed relevant when the researchers noted that their neural learning model outperformed established simulation methods when predicting cosmic structures “in the nonlinear regime”. Could there be an innate order, if not “naturalness” in how the Universe (or reality) unfolds?
Toward explaining deep neural network behavior
To paraphrase the physicist, Sabine Hossenfelder, as she explores matters such as “naturalness”, one should not get “lost in math” — at least not in this stage. Then again, as she added elsewhere, “certain coincidences scream for explanation”. So, logic, a philosophy of cause and effect, is needed here. Of course, in the search for logic, thermodynamic principles must be the safety net. Nature reduces an energy inequality by some form of motion, if possible: Zeroth Law. Bookkeeping is needed to trace the transformation of energy involved so nothing gets lost: First Law. We should not try to turn back the clock because it’s not possible to recoup the energy that is lost as entropy (chaotic motion) during energy transformation: Third Law. In my experience, however, the crux of the problem when identifying logic is in our angle of view. Our usual perspective is culturally bound, deeply anchored in history as it is.
Despite Heisenberg’s Uncertainty Principle, which shows that particles behave as either waves or things, we interpret our world predominantly as assembly of things. This is not at all surprising. Aristotle already insisted that our world is made up of things that may or may not be in motion. If that was not enough, in the seventeenth century, René Descartes, a contemporary of Isaac Newton, inspired his generation when he explained our world by dissecting it into corpuscles, another term for things. In the mean time, we made sure that our children follow this line of reasoning. For decades now, we have been teaching them — some of them future physicists at CERN — to assemble their world, either factual or imaginary, using Lego bricks, ‘things’ of a kind.
As a result, we watch our world through an imaginary funnel, looking at it from the wide end. What we actually see is what the narrow end reveals. Wherever we point the narrow end, we see bits and pieces but not the whole. To see the whole, we need to assemble it in our mind first, using the things that we observed — this is why we all perceive our world differently. When we focus the funnel’s narrow end on something that moves, we can only keep it in focus by moving the funnel too. The funnel, this way, dictates us the motion of something in terms of speed or acceleration. It’s how Aristotle, Descartes, Newton, and so on defined our world: motion as attribute, measured by an observer.
Neural networks have no idea of the things that we observe. Nor do they have a notion of the odd assemblies in our mind: the funnel is inverted now. They watch our world from the funnel’s narrow end and keep count of parallels at the wide end. What’s more, when the researchers trained their neural learning model with displacement data, motion was no longer an attribute and thus replaced ‘things’ as defining items of our world. So, while we continue to see things that we hope to assemble in our mind, the model keeps count of parallel displacements that it compounds as choreographies. What are these about and why do they make a difference?
Physics demands that nothing happens without an energy inequality or gradient, that is, neighboring areas of relative surplus or shortage. If this is the case and the situation permits, a flow develops when energy in some form or shape drifts to bridge these areas. Eventually, as the First Law predicts, an inequality or gradient is reduced this way. As Prigogine demonstrated, molecules starts following orderly paths under certain circumstances to reduce a temperature gradient more efficiently. Richard Feynman was particularly interested in the motion involved or, rather, in the path of motion. In 1964, in his nineteenth video-taped lecture, Feynman passionately explains how his high school teacher, Mr. Bader, inspired him for life. Having noticed that his student had gotten bored, Bader presented Feynman with the Principle of Least Action. When a particle moves from one place to another, its average kinetic energy, that is, the energy due to its motion, less its average potential energy, that is, the energy due to its position and state in relation to other particles, “is as little as possible for the path of an object going from one point to another.” The path travelled by a particle, in other words, is a path that always takes the least action and not necessarily the shortest route or, even, the least time.
The path travelled by a particle is a path that always takes the least action and not necessarily the shortest route or, even, the least time.
As Feynman confides with his audience, he has never stopped exploring this principle, spending ample time to prove that it applies not just to the motion of objects but also to the motion of particles and photons. “It isn’t that a particle takes the path of least action but that it smells all the paths in the neighborhood and chooses the one that has the least action.” In view of the power to predict, Feynman added one more crucial point: “Every subsection of the path must also be a minimum. And this is true no matter how short the subsection. Therefore, the principle that the whole path gives a minimum can be stated also by saying that an infinitesimal section of path also has a curve such that it has a minimum action.” In fact, Feynman’s findings suggests that from the very beginning, particles tread paths of least action. They instantly know, as it were, which paths to take to get somewhere most efficiently. This is a pivotal statement especially also in view of the neural learning model used by the researchers. Having been trained with the output of 8000 displacement simulations, the model necessarily identified least-action-path patterns and, as a result, compounded least-action choreographies.
The model necessarily identified least-action-path patterns and, as a result, compounded least-action choreographies.
In sum, gradients, such as temperature gradients and initial conditions, induce motion on least-action paths. These paths depend on the behavior of the actors involved, such as molecules and gravitational densities. Because a least-action path is itself about displacement, it is bound to produce a gradient in the shape of a relative shortage or surplus. This looped relationship between gradient and least-action path, shown as a pale-red backward arrow in the figure below, is an example of what I’d call circular ontology because each item causes the other. Circular ontology explains how a least-action choreography multiplies and spreads after a haphazard agitation somewhere triggers it.
Least-action choreographies are innate in the sense that they depict the behavioral signature of certain actors under different conditions.
Prigogine’s research provided the thermodynamic premise for least-action choreographies of convective heat-transfer cells in a broad range of natural phenomena, such as granules at the surface of the sun, the high- and low-pressure weather patterns that we experience each day, and the dynamics of Earth’s tectonic plates. While the environmental parameters differ greatly for these phenomena, they share the same least-action choreography. Such choreographies can, therefore, be said to be innate in the sense that they represent the behavioral ‘signature’ of certain actors under different environmental conditions.
This reminds of the bedazzlement of the researchers when they discovered that their neural learning model had predicted correctly the gravitational displacement patterns for cosmological conditions other than it had been trained for. The pattern of gravitational displacements, as predicted by the neural learning model, necessarily also represents a choreography of least-action displacements. This pattern, in other words, is the behavioral signature of gravitational density fields. Having identified this signature, the neural learning model understandably needs little time to predict the future state of the Universe under different cosmological conditions.
If this is true, innate, least-action choreographies must also exist for the behavior of other actors. Examples that come to mind are traffic flows, healthy neuron cluster behavior, epileptic neuron cluster behavior, cancer cell growth behavior, and organizational behavior — the latter being a topic that I personally investigated. Once a deep neural network identifies these signatures, it will be able quickly to predict a future state.
Physics offers an explanation of how a neural learning model, trained with gravitational displacement data, might predict in little time a future state of the Universe — in terms of gravitational densities — for different cosmological parameters without additional training. The clue is in gravitational displacement that, like motion, necessarily unfolds on paths of least action. The pattern of paths identified by the model therefore represents a least-action choreography. This innate choreography is the behavioral signature of gravitational densities. I arrived at this view by relating the finding of the researchers to the work of Prigogine on the emergence of orderly behavior. I made use of Feynman’s fascination with motion unfolding on least-action paths. If the logic holds then the innate, least-action choreographies of other actors are waiting to be identified as signatures too. These signatures may help us quickly discover and predict matters, such as cancerous growth, epilepsy, traffic jams, and organizational failure.