What Makes Neural Networks Fragile

The Epistemology at the Root of their Failures

The Startup
Published in
11 min readJul 11, 2020


What do the images below have in common?

Most readers will quickly catch on that they are all seats, as in places to sit. It may have taken you less than a second to recognize this common characteristic. If I heed Andrew Ng’s suggestion that anything a human can do in less than a second can be automated by a Neural Network, then I should be able to create an image classifier that recognizes seats.

I could write a standard classifier using off-the-shelf python libraries. I can’t predict how good its confidence intervals will be. One thing I do know is that, regardless of the amount of data I feed it, the result will be fragile. It will break when tested on an image of a seat that deviates from what it has previously seen. I’ll have to show it a lot of images to cover all cases — it will be data-hungry.

In the recent decade since the proliferation of Neural Nets in the tech industry, a pressing concern has come to dominate the field of Machine Learning: “why are Neural Networks so brittle, so narrowly bounded, so poor at transferring what they have learned from one situation to a similar one?”

This article proposes that the answer lies in the field’s foundation, in the basic assumptions we make when we use Neural Networks. It’s also why some in the field are suggesting that if we are to overcome this hurdle, the field as a whole has to reinvent itself.

The Epistemology of a Neural Network

In the classification example above, I chose to use a Neural Network to recognize seats, and in so doing, I made an unspoken assumption: that somewhere in the arrangement of coloured pixels, I can locate and construct the concept, seat. I might find it in the images themselves, or perhaps in the objects represented in those images. Either way, I believed that if I parsed enough images, enough data, maybe even 3D models of the objects themselves, I could eventually unearth what it is that makes each of them a seat.

Consider the following quote, from a U of Montreal review of Representation Learning that summarizes this assumption:

“An AI must fundamentally understand the world around us, and we argue that this can only be achieved if it can learn to identify and disentangle the underlying explanatory factors hidden in the observed milieu of low-level sensory data.”

Every Neural Network classifier has, at its core, this assumption: that the source of truth is found in the statistical structure of the world, and consequently can be discovered by parsing the data objectively. Concepts are treated as high-level features. Human brains are even presumed to be machines that discover such objective patterns and create mental models based on them. A later quote from the same review says:

“…this hypothesis is consistent with the idea that humans have named categories and classes because of such statistical structure (discovered by their brain and propagated by their culture)”

Unfortunately, this assumption is not only poorly supported, it is wrong. Whether or not you realize it, you and everybody else projects their own concepts onto the world they see. You are, with every concept, forcing the world into your mental mold, not the other way around.

Look back at the images of the seats above. Although you hadn’t seen those images before, you immediately recognized them as seats. How? Did you dig up and decode some pattern that was hidden in the images themselves?

Dig a little deeper. Why do you even have a concept of a seat? Why did you invent this idea, or learn it, give it a name, recognize it in the world? And what do all the objects in the images have in common that makes you unify them under that concept?

The answer should be fairly obvious: if you were ever feeling tired, and looking for somewhere to sit and rest, you would seek out one of these objects. A seat serves a purpose; it solves a problem for you.

That’s strange. It seems that the definition of seat is not inherent in the objects themselves, it is based on where you would like to sit. Your interactions with the objects, not the objects themselves, define the concept for you. If any of the objects pictured did not feel like a welcome place to sit — if you felt you were too clean to sit on a dirty rock — you might not consider it a seat. It would stand out as an exception.

Or take the concept of food. If you browsed a dataset of pictures of food, it might include a picture of an insect. Whether you felt that image belonged in the dataset would depend on whether you yourself ate insects, as some people do. Or if, due to some improbable biological mutation, the entire human species should suddenly be unable to digest pineapples, then images of pineapples would cease to be classified as food, despite the fact pineapples themselves haven’t changed. Indeed, I can make food look like anything I wish, and if it becomes popular, then Google’s image classifiers must begin classifying it as food.

If even concepts like seat and food are so malleable and subjective, what can you say about concepts like beauty, far, or intelligent? It seems none of these are actually located in the world itself. They exist in your motivations. They can’t be found in the data. Looking for them there is a futile effort.

The Alien in The Machine

Imagine you met an alien whose body and biology were drastically different from yours. Would you expect it to recognize a seat in the same way you do? What about food and beauty?

If it didn’t sit the way you sat, eat what you ate, and if it wasn’t attracted to the same types of creatures you were, it would initially fail quite badly at recognizing seat, food or beauty from a set of pictures.

To compensate for this disconnect, say you train the alien to recognize food and chairs. You use flashcards, showing it one image at a time. In this way you hope to prepare it for decent human society. It’s an arduous process. Deep down, you know that there’s a good chance it will embarrass you if it finds itself in a situation that falls outside its training, and it ends up offering your guests scented candlesticks to eat.

How much easier would this process be if the alien ate food in the same way you did. As with a human child, almost no effort would be required to teach it the concept. It would drive its own education by its desires; indeed, satisfying its desires would itself be that education. This is the advantage of learning as a human being does.

Every Neural Network classifier is like that alien, with no motives comparable to yours, arrived on earth and conscripted into service as a classifier. It has no legs to rest, no hunger to satiate, and no desires to quench, yet it is stuck trying to find patterns in data that it can use to recognize seat, food, and beauty. As a data scientist, you get upset when it fumbles, or bases its decisions on spurious features in the data. It seems to lack any semblance of common sense.

A Mountain of Bandaids

In the face of these setbacks, you may try to boost its intelligence. You increase the number of equations it can process, the number of transformations it performs, the number of samples it memorizes, and the nuance of interpolations between them.

You employ adversarial training to try to bring it, or rather force it, back into an acceptable range of behaviours. You use LIME to spot and correct unjustified correlations. But these are only newer, better training tools to give the network the semblance that it understands. You are adding a meta-band-aid onto a heap of band-aids.

All the while, the meaning of the concept is still in the trainer’s mind, in your mind. There was never anything in the data that could be used to help. The A.I. is playing a game called “Guess What My Favourite Number Is” with humans who get upset when it makes a mistake.

This is why Neural Networks classifiers are fragile. They are adrift in a sea of data that is meaningless to them. For every concept, they lack a core focus, an opinion that unifies the phenomena. They can’t make sense of the data because they have no motives out of which to make sense. Look again at those last two words. They are no coincidence; you don’t “find” sense, you “make” sense.

Objective Truth

Have you ever walked around an old neighbourhood from your childhood, and seen the same places you used to know, but in a different light? As a teenager I didn’t really notice that my home was across the street from a daycare. As an adult who might end up having children, I look at the same building, and I note that the condos I lived in are in a good location due to their proximity to that daycare.

My motivations shape what I see and how I see it. Concepts like “family-friendly” which I would have had no reason to consider, start to seep into my awareness. Someone could have vainly tried to teach me these concepts as a teenager, but I wouldn’t have fully grasped them until I could empathize with their driving motivation.

Each new goal or motivation you adopt changes the world in which you live, as if you had experienced a miniature paradigm shift.

“What a man sees depends both upon what he looks at and also upon what his previous visual-conceptual experience has taught him to see ”

“…though the world does not change with a change of paradigm, the scientist afterward works in a different world.”

When Thomas Kuhn wrote the statements above, he recognized that even when it came to apparently objective scientific facts, what you see is shaped by what you are looking for.

The mere act of labeling an image in a dataset implies a ground truth. This is unwarranted. Every label is a choice on the part of the labeller, according either to his personal inclination or the prevailing social consensus¹.

Browse the ImageNet dataset. You’ll find that the image labels are largely defined by arbitrary, and contemporary tastes. They are highly mutable. Take the concept of spatula: I have genuinely debated with friends as to what does and does not count as one. I found a label called Wedgewood. That’s a brand. It includes anything the Wedgewood company decides to produce. These are just two examples I immediately spotted browsing the utensil section.

If even concrete concepts like those are arbitrary and susceptible to wide-scale revision as cultural tastes change, what can we conclude about “objectivity” except that it is at best a provisional consensus?

A Philosophical Choice

In this article I’ve shown many examples that demonstrate the motivations and subjectivity at the root of all concepts. Despite all this, there are some readers who will be unable to accept the notion that concepts, like beauty, are born in the eye of the beholder. These readers can’t be convinced away from the belief that there are patterns in reality itself which underlie every concept. “The concept of food”, they’ll say, “is a part of the fabric of the universe, not something subjective to me”. They may even look at their loved ones and say “I find you beautiful, not because of my personal attachments, but as a consequence of the symmetry in your face”. Exceptions to these rules will not phase them.

I have no doubt such readers will mull over the dilemma in the image above, then brush it off as insignificant. I’d remind those readers that science only progresses when people detect and investigate exceptions to the dominant paradigm.

The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” but “That’s funny …” — Unknown

The idea that concepts arise from within reality itself, and hence can be sought out in data, is a comfortable position to hold, regardless of the evidence to the contrary. It lets a person think that their beliefs match the truth, and are therefore beyond doubt or reproach. And since such people think they are being objective, they can’t easily be convinced otherwise².

Such people, unfortunately, have to be left behind.

For the rest of us, if we want neural networks to overcome this next great challenge, to make them robust, reliable, and meaningful, we have to make the agent’s motivations a fundamental factor in its calculus of concepts.

Once you make this epistemological leap, a lot of seemingly insurmountable problems quickly vanish. The symbol-grounding problem is resolved: concepts are based around motivations, and symbols are instantiations of concepts in a specific context. A.I. no longer needs to be fed reams of labeled data to cover all possible edge cases — it can define its knowledge by itself, based on what will help it achieve its goals. It can self-correct in cases of uncertainty. It develops common sense.

These benefits are not without their costs. We can no longer approach training models as if they were ingesting unordered, bland, uniform samples. To truly learn concepts, not like an alien but like a human, a classifier must interact with a world that is richer than what we currently provide. This will enable it to define concepts in the context of its motives. An agent must experience what tiredness is before it learns to identify a seat. It must, to a large degree, be a free agent in its own world. All this is difficult to implement, but once implemented it is easier to intuit.

Nor can we avoid this challenge. There is a prevailing, naive, hope that at some point we’ll be able to look at Neural Networks from such an angle or with such a pair of glasses that the answers will become clear; concepts will emerge as epiphenomena from within the tangle of perceptrons, perhaps in the same way they emerge in our own intricate cerebrum.

We can’t hide behind the naive optimism of Neural Networks’ blackbox. We should finally let that dream go, and dedicate ourselves to the more difficult task of creating meaning, of making sense.

Thanks to Graham Holker for reviewing this, and whose discussion prompted many of the arguments in this article.

[1] This article is not intended merely to argue that “subjectivity” exists. That premise was never in doubt. Rather I argue that the unifying principle behind every concept is always rooted in the observer’s motivations. Nor do I suggest that reality plays no part at all in concepts, for where else would your mind derive it’s motivations from? If we liken reality to potting soil, then a motivation is the plant, and a concept is the flower.

[2] It’s worth noting that the statistical approach, i.e. that truth can be found in data, is still useful, and has been proven so over centuries of scientific research. During that time, scientists have defined concepts, made hypotheses about them, then tested them against reality, that is, data. This approach has only recently become a liability, when it has been applied to cognition. We are trying to conscript data into defining concepts and hypotheses themselves, a task that used to be reserved for humans. The previous assumptions therefore no longer apply.



The Startup

I lead an applied A.I. team at a robotics company. On the side I develop A.I. that uses motivated reasoning and invents abstract concepts from scratch.