What is a chair? — part I. Why AI is far removed from real “understanding”
Let me ask you a rather mundane question. What is a chair?
Take a minute to answer. Here’s a picture in case… well, just in case —
I guess your answer looks something like this:
- A chair has four legs.
- A chair is something to sit on with support for your back.
- A chair is a physical object, with the purpose of supporting your bottom.
- A chair has a vertical plane perpendicular to its legs.
That seems simple. But is it?
Human understanding is very sophisticated
Let us have a closer look at these descriptions, and find out what it means for humans to really understand what a chair is. (The “what is a chair” question is a classic one in AI education to illustrate the complexity of the world and the sophistication of human understanding.)
Understanding is multi-modal and multi-faceted
When we think of a chair, a very sophisticated and rich concept comes to mind. We can point out the purpose of a chair (2, 3), or define its structural properties (1, 4). We can imagine how it looks like (vision), what material it is made of and how it feels (haptics), how much it weighs or how it sounds like when moving (hearing). We do this without effort.
One of humans’ strengths is the capacity to combine many multi-modal and multi-faceted aspects of a concept in one coherent mental “picture”.
Concepts are ambiguous
One of the definitions (2) mentions support for our back. Is this really required or rather optional? In the end, isn’t a stool a chair?
We could go on like this. According to the above descriptions, a pouffe is not a chair, but a table is! And in a sense, a table is a kind of chair, sometimes, depending on the context.
And actually, to be really precise, one should also define the other concepts. So, what is a leg? How should a computer know we mean the inanimate object and not the leg of an animal?
Understanding requires common sense reasoning
Most of the explanations, except number 3, leave out the quite important fact that a chair is a physical object, and — as we all know — therefore obeys the laws of physics! For example, it will not float around, and we know it cannot simply “disappear” (something called “object permanence” in cognitive psychology). Children learn this at the age of 7 months.
Though this may seem straightforward to us, it is one of the challenges that self-driving cars are facing: how to combine pixel-information in Computer Vision systems with knowledge about the world, like the laws of physics?
Definitions depend on context
Now imagine an old man is exhausted and asks for a chair to sit on. In this case, we can guess the intention behind his request, and figure a low table will surely do. Now compare this situation with the request to buy six chairs for your living room. We see that the definition of a chair depends on the context when using this word.
Today’s AI systems, however, typically employ one single, context-free representation (e.g. a tensor like a word embedding).
Let’s look at another example. Imagine you are baking cookies. The recipe tells you that they are ready when they are “light brown”. What colour is this exactly? The answer will depend on experience, i.e. what physical colour corresponds with a delightful cookie. And it will be personal, according to the individual’s preference (the ideal cookie.)
Definitions interact and are inter-related (Gestalt)
But even within a well-defined context (which is in general impossible to determine), coming up with a precise definition is very hard. For example, when considering structural descriptions like “a chair consists of four legs”, it appears that it is impossible to define a leg without explaining what a chair is, and vice versa. We see that human understanding exhibits Gestaltist characteristics because the parts cannot be understood without the whole.
Understanding requires a rich & holistic understanding of the world
So far, we neglected an important question: why do chairs exist at all? Why do we sit?
To understand the function of a chair, we need to understand what it means to be human: that we walk, that we get tired, and need rest.
The list never ends. There are so many recursive and mutual dependencies between concepts that we need to understand the world in a very rich way to understand even a — seemingly — very simple concept like “a chair.”
So what about “computer understanding”?
Past efforts in AI in trying to model all these facts and rules of how the world works —a so-called symbolic approach — however interesting, have not led to success in real understanding. But blaming failure on the approaches would ignore the deeper problem:
What do we expect exactly when we hope a computer “to understand” what a chair is? The answer to this question in itself is very hard and shows the need for human mental models to co-evolve with technology.
For this reason, instead of trying to solve the “intelligence" question, a shortcut is typically taken in AI. That is, to define a task (with an outcome that can be measured) that “proves” a particular aspect of understanding.
For example, we could give a computer a particular “input” (e.g. an image) and let it decide “whether it is a chair or not”. Clearly, this is a much more shallow version of understanding.
The depth/breadth of understanding is constrained from the start: the definition of the AI task assumes a particular interpretation of understanding.
This behaviourist approach, where intelligence is measured by means of extrinsic observation, has important consequences.
It is introduced to manage the complexity of the problem at hand and to allow for quantitative evaluation. When building computer systems that implement a particular approach, we in fact build operational models of intelligence.
In our case, it will be a kind of script, an algorithm, implemented in a programme. When executed, it unambiguously decides whether the numbers at the input are a chair (1) or not (0).
Numbers have no meaning to a machine
Answering the seemingly simple question “chair or not?”, now looks like a far from an easy task. To have a computer answer this not-so-mundane question, we need to define a mathematical formula, in the end, to make the decision. Both input and output will be numbers, that can represent anything.
Within the machine, these numbers have no meaning. But once the number leave the machine and one acts upon this knowledge in the real world, the numbers obtain meaning.
With the increasing use of algorithms, we cross this interface or boundary between meaningless machine manipulation and value and meaning in human lives, evermore. We increasingly make judgements on the basis of information that has been obtained through “meaning-unaware" algorithmic interventions.
For example, we may deny a loan to someone based on a credit score, severely impacting this person’s life. Or, less harmful, we might do an image search for chairs and decide to buy the most common one.
At this moment, the exact operationalized definition of a chair, in the form of a mathematical formula, a computer programme, matters. And implicitly, wanted or not, assumptions and choices have been made about what a chair is.
And it is very probably not what we think it is…