Deep Learning Underspecification and Causality

Carlos E. Perez
Intuition Machine
Published in
6 min readNov 17, 2020

--

Photo by Ramadhani Rafid on Unsplash

Excellent paper from Google discussing the robustness of Deep Learning models when deployed in real domains. Underspecification Presents Challenges for Credibility in Modern Machine Learning

The issue is described as ‘underspecification’. The analogy they make is linear equations with more unknowns than the number of equations. The excess freedom leads to differing behavior across networks trained on the same dataset.

This is one of the rare papers that has practical significance to the production deployment of deep learning. I’ve alluded to this problem previously with respect to physical simulations. See: The Delusion of Infinite Precision Numbers

I wrote, “The main argument against DL models is that they don’t represent any physics, although they seem to generate simulations that do look realistically like physics.”

Conventional computational models are constrained to reflect actual physics. Proper DL methods therefore also have to be constrained similarly in their dynamics.

Although we have seen some impressive progress in this area. It’s difficult to do well in domains like NLP and EHR. This is because our models of these domains are also underspecified. We do not know the constraints to aid in constraining our models. So in edge cases that are absent in our training set, we are unaware of their emergent behavior.

This problem however is not unique to DL models. They also exist in conventional computation models. That is why ensembles of models are always used to predict weather patterns. The Church-Turing hypothesis is simply unavoidable for complex systems.

This problem is also related to Goodhart’s Law. “When a measure becomes a target, it ceases to be a good measure” In other words, as soon as you use an “objective” measure to assess performance, compensation, etc, people will figure out how to game the system. (see: https://en.wikipedia.org/wiki/Goodhart%27s_law). Machine Learning systems are trained to find solutions to narrow problems. It is often that ML will find a short-cut to a solution. This short-cut is typically one that games the objective measure rather than to solve the actual problem. A solution to mitigate against this is to simultaneously train to solve multiple tasks.

Apophenia is the tendency to perceive patterns between otherwise unrelated things. The term (German: Apophänie) was coined by psychiatrist Klaus Conrad to describe the beginning stages of schizophrenia. Short-cuts that an ML algorithm discovered are analogous to the fictitious patterns that humans might perceive. Humans can create entire imaginary models of fiction through an overly imaginative interpretation of the world. Any pattern discovery system can hallucinate patterns of causality.

The path forward for robust AI will always depend on good explanatory models that capture the relevant causality of the system being predicted. Advanced AI should not be driven naively by curve fitting, but rather by relevance realization.

That is, true intelligence is a multi-scale phenomenon and the issue of judgment is critical to its effective deployment. Judgment requires the appropriate formulation of causality at a relevant level of analysis. This is different from causation.

What is the difference between causation and causality? The former is a consequence of a generative model and the latter is a consequence of a descriptive model.

Causation is the emergent partial ordering induced by computation. Theoretically, all the characteristics of computation such as universality and the halting problem are inherited by the concept of causation.

Causality however is a different thing. It is a process that approximates the causal behavior of complex processes. Approximates in the sense that it describes the process. This is different from simulating the process which has its own intrinsic limitations (Church-Turing).

Brains do not simulate the world. Rather they create approximate models of the world. The more consistent these approximations of reality are, the more competent a brain is in navigating its world.

Therefore, when we speak of brains being able to discover causality in the world, we are really referring to brains building approximate models of the world. But perhaps the use of the word approximate is not a good description.

The adjective ‘useful’ is better. Organisms create useful models of reality. What this implies is that an organism develops an algorithm that appears to be a heuristic because it is based on an incorrect model of the world. Heuristics leverage useful models of the world.

Civilization was not unencumbered when it used a flat-earth model of the world. It was only a hindrance when global navigation became available. The counter-intuitive notion of traveling in a straight line and returning back to the same point became imaginable.

In other words, the fidelity of our models become more important as our capabilities increase. Moths navigate through the use of light, but they die when that light is a flame. Birds on the other hand have a more advanced model for navigation.

I same thing can be said about human models of the world. The Piraha native tribe of the Amazon have a peculiar model of reality. Where in there is only the now that is of importance. (see: What’s the Bottom of the Knowledge Structure Stack Look Like? Dan Everett and the Pirahã)

Humans can develop different models of the world. For the Piraha is was only the present moment. For most of civilization, it was the present and the past that was important. They bequeathed understanding of the futures to the gods.

The scope of understanding of modern civilization involves an agency into our collective future. It is surprising that many in our society believe that this is still a domain for the gods. Similar to the Piraha, there is a belief in a ‘natural order’ of things.

Living things have generative models that are constructed through past experiences. These models influence a living thing’s possible actions in the present so that it can condition its intentional future.

Brian Cantwell-Smith describes computation as the interplay of intention and mechanism. Organisms intend to survive into the future and they generate their actions based on past amortized experience.

The translation of an intention to action is a generative model. However, the constraints imposed on this generative model is a consequence of a descriptive model of reality.

Descriptive models become more useful when they are consistent across more interpretations. A moth has much more limited awareness of this world and thus has a narrow interpretation. A moth is unable to distinguish a lighted candle from the light of a moon.

Humans have a high-density fovea is around 7 megapixels. We see more detail in this world to be aware of distinctions. We know that a flame is different from the moon because it simply looks different. A moth can’t distinguish these signals.

The solution then for underspecification is the judgment that is a consequence of constraint satisfaction across many partial descriptive models of the world. There is never really just one model, but rather humans perform judgment by balancing many competing models and arriving at a compromise.

This kind of cognition is best described by Bahktin’s Dialogical Imagination:

gum.co/empathy

--

--