The Trap of Classification

Daniel Fenton
4 min readNov 17, 2018

This post is about the practice of classifying data into finite classes, and the trap of specifying a right answer.

If high-dimensional representations are the principle behind the success of neural networks, the practice of discrete classification should have been made redundant. A network model should not be forced to collapse it’s output to such a sparse vector, such as the labels ‘1–9’. I don’t believe the brain ever processes discrete categories of digits ‘1–9’, and likewise it is the wrong approach for network models. The challenge for models, and what the brain is able to do, is to communicate the output ‘3’, while still maintaining a distributed representation of that concept.

Essentially, it is to obey the principle that there are no categorical variables, only continuous variables which need modelling with additional parameters.

Consider the figure below.

Figure 1. Squiggles on a page, a telecommunications company logo, How It Feels To Chew ‘3’ Gum, Piranha Plant sprites, or, five cases of the digit ‘3’

Currently we tell machine learning systems that the above 5 cases are identical with one-hot encoded vectors. However in many ways, the above 5 cases have very different contexts and properties. The generality of a learning system is crippled by pretending that these 5 cases are completely identical.

In a high-dimensional vector, which would capture more of the totality of each of these objects, in at least 1 dimension, the vector of these 5 images would have similar values. That dimension is the one that would capture that in a particular context, these cases all represent that same numerical digit. But that is just one dimension of many that provides general information about these objects.

For a crude visualization, consider the figure below.

Figure 2. Hypothetical array containing 5 vectors representing the 5 cases of ‘3’ above

This modelling strategy is consistent with the theory of mind that believes cognition is built on metaphor or analogy. In high-dimensional space, vectors can share values with obscure pairs of objects. For example, you can imagine constants between mint herb and mint ice-cream, and likewise between mint ice-cream and frozen meat, as well as between frozen meat and grazing cattle, and grazing cattle and mint herbs.

The challenge in discussing, and implementing, this approach is that in the back-end of such a system, none of these categories would exist as individually specified vectors, the entire model domain would be more like a continuous surface. When you specified some co-ordinates, less one degree of freedom, you would be able to return a result from the model, but otherwise, the entire mind of the model would be one continuous domain.

Relating this back to classification, the specification of a finite set of possible classes is an unscalable approach. In specifying the desired label such as ‘1–9’, you train the model to extract just one property of the image. And if there is an image of a dinosaur given to a model trained on MNIST, the model will attempt to extract the ‘digit’ dimension from the image of the dinosaur.

This is closely related to the idea that it is impossible to define anything in reality with just one metric, but that each additional defining metric contributes to narrowing down the domain, such that the system gains a specific understanding of a real object.

In the same way, in defining any objective or policy to set actions for an agent, the lower the dimension or parameters involved the greater the domain of unintended consequences can be. Just as the use of hand-crafted classification functions could not deliver the desired result that deep learning could, so hand-crafted objective functions cannot deliver the desired results that learning a high-dimensional objective function can. Humans can’t re-equip their judgements with the required context for a machine to get the point of a goal in reality.

I believe this is what Einstein was referring to when he said:

“In so far as theories of mathematics speak about reality, they are not certain, and in so far as they are certain, they do not speak about reality.” ― Albert Einstein, 1921

I think a demonstration of this idea is the observation that there are not reliable models for “real” fields like economics or sociology, and even biology. These are examples of real systems where trying to narrow things down to few dimensions cannot create reliable models. Here the best model is actually one that is imprecise and that maintains a high-dimensional context and ambiguity.

“It is better to be roughly right than precisely wrong.” ― John Maynard Keynes

Essentially, the best answer is no particular answer. The best answer can’t be reduced.

--

--