Physical and Complexity Theory Preambles to the Chaotic Limits of Deep Learning’s Approach to AI
A 1900 famous quote, or some variation thereof, has been attributed to a famous Scottish physicist, mathematician, and engineer: Lord Kelvin. It reads as follows:
“There is nothing new to be discovered in physics now. All that remains is the more and more precise measurement.”
Besides the fact that one could consider it an unfair attribution to him, I want to concentrate on the actual meaning of the quote. Basically, it reflects a feeling at the time that all physics was already discovered, and from that point onwards, the only new achievements were going to be better precision in the instruments to measure physics phenomena, adding more significant figures to fundamental constants, variables, etc.
This implies that fundamental research was pretty much over and new discoveries will only be achieved by implementing and exploiting physical phenomena to the benefit of humans, i.e. improving the steam engine or (Étienne Lenoir’s) internal combustion engines, etc.
At the time some scientists were probably biased because they were living in an “age of (physics) implementation”. This will be returned to at a later stage.
Whilst there is no doubt about the importance of precision improvement in science and technology, it is certain that an ambitious discipline like physics couldn’t be condemned to only this.
History teaches us that sometimes unforeseen events can change the course of it. Five years after the “nothing new to be discovered” quote, modern physics was founded. Now we remember this date as the celebre Annus mirabilis. And for good reason.
In 1905, special relativity, that directly derived to General Relativity, and quantum mechanics were born. These ideas changed the paradigm of 19th century Newtonian physics, electromagnetism and thermodynamics, making the transition to a new revolution in Physics.
Albert Einstein arrived to a beautiful and solid theory of gravity: The General Relativity (1915). He developed it by exploring apparently unconnected branches of knowledge. From one side, he studied the Riemannian geometry, i.e. how two points can connect to each other in curved surfaces and spaces (not by a straight line by the way). From another side, Einstein imposed the concept of invariant frames of references to all physical systems, including the Electromagnetism (EM), unified years ago by James Maxwell.
In EM, electric and magnetic fields travel through space as waves moving at the speed of light. It was known at the time that an electric field can generate a magnetic one and the other way around. This describes perfectly the nature of light which we know now is an electromagnetic wave.
Nevertheless, the roles of the frames of references in EM were not clear at that time. This is much more complicated than the case of a “man running on top of a train” thought experiment. Here if a train travels at 100km/h and a man runs on top of it at 30km/h, therefore for a third observer looking at all this from a train station frame of reference, the running man will appear to be moving at 130k/h (or 70km/h if running in the opposite direction), but for the man on the train he will be just running at 30km/h and will pass by the equivalent observer of the train station at a speed of 130km/h.
Visualizing this in Electrodynamics is difficult, and it was much more for physicists at the time. We have to imagine a magnetic field moving at a certain high speed towards an electric one. In this spirit, if one observer ‘rides’ the magnetic field, it will appear that an electric field is generated moving towards the magnetic one, whilst if the observer rides the electric field, it will appear that a magnetic field is generated moving towards the electric one. Somehow the observers are not equivalent anymore. So where is the train station in this system? Well, at that time it was called the luminiferous aether. Literally.
Anyway, It was until the arrival of Hendrik Lorentz that things started to get a bit clearer. He introduced a ‘transformation’, which is a mathematical trick that kept the Maxwell’s equations invariant under different (inertial) frames of reference. This is equivalent to say that, by applying these transformations, the Electromagnetism equations are valid to all (inertial) observers. Solving the problem of non-equivalent observers.
Invariances under the change of a frame of reference were already known in Newtonian mechanics as Galilean transformations (man-on-the-train example). But Maxwell’s equations were not invariant under Galilean transformations. Something had to be wrong, either the Lorentz method was not the correct one for physical phenomena, or all the magnificent Newtonian mechanics, which are invariant under Galilean transformations, were wrong. At this point physics was at the crisis, Einstein had to decide something to continue with his analysis, and despite the ridiculous second option,
He trusted Lorentz.
Einstein re-interpreted the physical implications of Lorentz transformations: space and time are altered by high-speed motion.
This changes the paradigm of time (days, minutes, seconds…) and space (meters, km, cm…) as absolute quantities by making them dependent on an ‘inertial observer speed of motion’. This implies that space and time magnitudes are relative to an inertial frame of reference, with a certain speed limit of course. Not surprisingly,
Lorentz transformations for velocities much slower than the speed of light behave like Galilean transformations.
So all low-speed Newtonian physics is Lorentz invariant. And in this ‘special relativity’ framework, nothing can travel faster than the speed of light.
Continuing with the story, some years later by applying the same Lorentz transformation laws, Paul Dirac expanded the quantum mechanics equations developed at the time, and by introducing a bit of special relativity, he interpreted the new results in an unprecedented way predicting the existence of antimatter. Four years later, another Nobel Laureate, Carl D. Anderson, confirmed the existence of this antimatter by measuring positively charged electrons: the positrons. Then many other measurements and discoveries happened to reach the point of development we have today.
What is clear is that if we back forward 100 years, we now know that the development of modern physics made that 1900 quote completely obsolete.
A good-read to go deeper on this subject is Thomas Kuhn’s(1) book “The Structure of Scientific Revolutions”, where it discusses the concept of problem-solving as a central element of science. For Kuhn, progress is achieved through scientific revolutions. “As a paradigm stretches to its limits, anomalies start to accumulate” to the point where the accepted approach cannot explain the certain observed behaviour. Then, citing Isaac Asimov, the greatest moment in science occurs, and is not when someone yells Eureka!, but the moment when someone stops for a while and says: “well, that’s funny” …
Does an emerging (proto-?) science, like data science, also follows a structure of scientific revolutions?
What is the role of this science in the development of Artificial Intelligence?
I believe that to understand these questions, we have to ask ourselves what are the limits of the current paradigms that form such discipline, like the limits of frames of reference in physics before Einstein. Therefore, the following pages of this paper will concentrate on describing some of the limits of one of the main paradigms of AI, the Deep Machine Learning.
Many Variable Systems and Chaotic Behaviours
But first, let’s go back to physics and the way it has been developed!
Physics is good describing systems with not too many variables, for example, Hydrogen, the most simple atom: one electron + one proton. The observed Hydrogen spectrum is pretty much exactly to the spectrum predicted by the equations derived from quantum mechanics(2). If we want to compare the calculated theoretical behaviour of this particular system, of not many variables, the experimental techniques have to be refined and improved to better approach the “indisputable” analytic result. But if we jump into the case of Helium (two protons + two electrons), analytic calculation problems start to emerge.
Due to the increment of the amount of variables, only certain analytical solutions to some limit cases are solvable. For systems like this, in general, calculations are only possible if they are numerical (non-analytical), and one must find a balance between these numerical approximative results and the experimental measurements to properly understand a system’s behaviour.
The numerical approximative calculations of a certain system is equivalent to the intrinsic inaccuracy of any experimental measurement.
The theory explains the body of knowledge of a system, the calculation describes the behaviour of the system under certain conditions and the experiment brings evidence of such behaviours. But what happens when an anomaly is found, the theoretical model fails to describe experimental evidence, so ‘corrections’ must be added to the original model, in such a way that it can explain the anomaly and still agree with older experimental evidence.
However, when anomalies stack and corrections to the model start to contradict new (or old) calculations and experimental evidence, a stretch limit is found, the science enters in crisis and a new paradigm is needed.
To illustrate this we will continue scaling up the Hydrogen example. Let’s use a bit the imagination by adding more protons and electrons to form other atoms like Oxygen (O), Carbone (C), Nitrogen (N); those very common atoms, together with Hydrogen, make-up around 95% of the human body. Because the additional protons and electrons interacting, the theoretical description and analytical behaviour of those individual elements (C-H-O) become more diffuse, and their experimental measurement less and less precise as we keep increasing the complexity of the system by adding more independent variables that describe new protons and electrons.
Let’s go big. Now imagine a molecule, i.e. a complex connection of Carbons, Nitrogens, Oxygens, and Hydrogens which is an example of simple amino acid. The computational power needed to describe it is already that of a computer of the 1980s. This scales so quick that a chain of just a hundred different simple amino acids, like a protein, are barely simulated with today’s 150-core supercomputers.
Then the 350 proteins that form the simplest cell known, is to the date a massive challenge to simulate with our computational technology(3). Imagine now a colony of those cells, or a colony of slightly more complex ones like those submarine sponges, the “first to branch off the evolutionary tree from the common ancestor of all animals”, sharing nutrients and working together for survival. Imagine an individual from a ‘newer’ specie like a cat, a dog or a human. Now consider the interaction between such individuals and their ecosystem and so on.
Systems like these involve so many equations with so many mathematical variables, and this implies so many parameters to calculate at a ‘Quantum Mechanical’ level, that such calculation would exceed the computational power of all the planet.
Basically, the remark is:
From the laws of quantum mechanics at the atomic level it is not possible to ‘climb all the way up’ to explain disciplines like sociology.
Even if we are all made of atoms and the equations describing the world’s “sociologic macro system” and its interactions are highly interdependent, to replicate the timeline evolution of this system, and predict its future behaviour, we will need to simulate a series of events to happen in the exact way as they happened in original macrosystem.
Tracking it all the way back, this involves a series of successes that depend on the original conditions of our Universe itself, from the BigBang to our days …
This complexity makes the system itself highly dependent on initial conditions, which in summary is the definition of, Chaos.
Of course, there are much smaller systems that are also chaotic in this sense, and to predict the behaviour of such systems, a new paradigm is needed.
We are entering into the domains of complexity theories.
The economist Friedrich Hayek’s makes a distinction between the human capacity to predict the behaviour of simple systems and its capacity to anticipate or predict the behaviour of complex systems through modeling (1978). He believed that economics and the sciences of complex phenomena in general, which in his view included biology, psychology, and so on, could not be modeled starting from the laws of sciences that deal with essentially isolated simple phenomena like physics.
Hayek would notably explain that complex phenomena, through modeling, can only allow pattern predictions, compared with the precise predictions that can be made out of non-complex phenomena.
I see similar behaviour on how Artificial Intelligence develops nowadays. Even though it does it at a much more accelerated step. The similarities between the complex phenomena, pattern prediction, variable parameterizing and the outputs delivered by complex arrays of artificial neurons, are clear examples of Hayek’s complex phenomena definition.
Here, the engineering tools developed to play a major role. Therefore, Deep machine learning is one of many techniques to approach to Artificial Intelligence.
Like physics research and physics implementation, AI research is very different from AI-based solutions
Kai-Fu Lee, a former AI researcher and now venture capitalist stated, Artificial Intelligence is going through an “age of implementation”. Just like the late steam machines and early electricity engineering back in Lord Kelvin’s time. Therefore, we cannot condemn AI only to the path of Deep Learning. A shift-of-paradigm is hopefully soon to come. Perhaps it will arrive a few months later after someone states that “there is nothing new to be discovered in AI now. All that remains is more and more pattern recognition accuracy”.
Chaos in Deep Learning
When a paradigm is stretched to its limits…
Back in 2017, Ian Goodfellow, et al., published a book I enjoyed reading called “Introduction to Deep Machine Learning”. For me there are many interesting topics covered in the book, but one is a particularly clear example of chaos. It takes the form of an equation:
In the figure above, an image classifier was trained to determine the certain number of categories in which a panda is one of them. If we take the input data x, the AI classifier will determine with 57.7% of accuracy that it is the image of a panda. Now, if we add a small perturbation, that looks like a colorful TV background noise, to the original panda image, the classifier will fall to a different category, the one of “gibbons”, and it will classify it with 99% of accuracy.
Although this is an extreme case, as a second (Adversarial) Network was specifically trained to trick the original classifier, it is clear that with a small perturbation, invisible to the human eye, and weakly identified as a nematode btw, one can make the classificator jump in the category space. This is analogous to the behaviour of a chaotic system, where a small change in initial conditions will generate a big difference in the dynamics of the system at larger times.
The most accepted definition in simpler terms reads, “Chaos refers to the issue of whether or not it is possible to make accurate long-term predictions of any system if the initial conditions are known to an accurate degree”.
Let’s see another interesting case.
In this example, (a) corresponds to the input image to be processed by the AI classifier, in this case for autonomous driving purposes. In (b) we can clearly determine the network has made some very good predictions, managing to detect people, pavement, trees, and their cointourns. Next, in © a very sophisticated ‘small perturbation’ is applied. Notice that to the human eye is somehow perceptible, but in no way to the point to confuse the situation with the classifier prediction (d). Not even the most inexperienced driver.
As I mentioned before, this is a very particular perturbation. Specifically designed to fool the autonomous driving classifier. In Machine Learning we call this an adversarial attack.
The Adversarial attack, depending on the accuracy and dataset used to train the original classifier, has to be custom made by another neural network especially trained to find the ‘weak points’ of the classifier. This is only possible due to the chaotic nature of Neural Networks, where, as mentioned before, a small difference in initial conditions, ends up in a widely different result. This is very similar to the case of the panda-nematode-gibbon discussed in the section before. Basically, that ‘nematode’ is a specifically designed ‘custom made’ noise to make the animal classifier network think the panda is a gibbon.
That noise was generated by a GAN, a meticulously programmed Generative Adversarial Network, trained to “fool models through malicious input” by exploiting chaos: one of the limits of the current approach to AI.
There are a vast number of examples regarding this exploit of Deep Learning. A particularly interesting one is the capacity of send ‘hidden messages’ to speech recognition systems. In this example, a piece of music, like Bach’s Concerto for Two Violins, can be slightly ‘perturbed’ in a way that is imperceptible to the human ear, but for speech recognition AIs like Siri, that adversarial perturbation could be a voice whispering information like a cooking recipe, the correct answers in a physics exam, or some detailed plan to rob a bank.
How can we be sure that hackers are not sending to each other adversarial encrypted messages over the radio? They would only need the same pretrained model to do this indetectable by anyone else.
Is this limit of AI the starting point of an indecipherable cryptography new wave?
Today an AI can be trained to interpret such custom made perturbations to reproduce almost any concealed message. It’s all about information theory now.
This actually gets more confusing. Think about it, the perturbed Concerto for Two Violins can mean different things for different AI’s if they were trained differently. It is to say, for AI1 the perturbed Bach composition could be a love letter written by some one, whilst the same perturbed tune for AI2 could be the arrangements of a surprise birthday party about to take place next week, and for AI3 the details of the escape plan after robbing that neighbourhood bank.
Perhaps someone is already using YouTube videos to send these types of messages, and I wouldn’t bet they are all love letters.
If each category of a Deep Learning image classifier is actually a word, let’s keep it simple, 20 thousand words. Then a series of small perturbations like those ‘nematodes’ in a sequence of images can hide a very complex message.
Yes. The complete Martin Luther King speech transcription hidden in that old family videotape. Only able to extract it with AI6, because for AI7 that’s nothing more than the first few pages of the 1001 nights, etc.
Sometimes a limitation is an advantage, it allows creativity to switch paradigms. And even if we are far away from a self-conscious system, or we are not capable to reach our current definition of Artificial Intelligence, all the intermediate steps, models and architectures developed will have their side implementations and use case applications.
While in sciences like physics a paradigm stretched to its limits marks the need of a new way of interpreting the Universe, in data sciences, the behaviour seems to be different. Perhaps it is still too early to answer the question of the role of this science in the development of an Artificial Intelligence.
The limits of Deep Learning’s Approach to AI are not over here. There are other limitations apart from chaos in this approach. Additional limitations means additional advantages.
It will be discussed in the second part of this paper, to automatically receive the full published article on the 4th of August, fill out the details.
- A more modern review is Caroline S. Wagner’s work “The Collaborative Era in Science: Governing the Network”, 2018
- Nobel-Prize of the now obsolete Rutherford-Bohr Model of Hydrogen, 1913
- Simulation shows how transporter proteins do their work in cells, S. Tonn, Stanford University, 2017 Phys Org, https://phys.org/news/2017-04-simulation-proteins-cells.html
- See publication and adversarial examples in: arxiv.org/abs/1705.07115.