Artificial Ignorance – why computers can’t make sense of us

Vilde Reichelt
Bakken & Bæck
Published in
7 min readApr 15, 2020

--

“Call me an ambulance, now”
– ”From now on, I’ll call you ‘An ambulance’. OK?”

Picture a world where humans can communicate with machines — one where a soothing voice reads you the weather report, orders food and makes shopping lists, without you needing to lift a finger. We’re nearly there, with the emergence of voice assistants. Just one minor detail is holding us back — Siri, Alexa, Google Assistant and other speech robots don’t really understand anything.

And how could we expect them to?

Technically speaking

Today our interaction with everyday helpers, such as mobile phones, cars, speakers, fridges, or even light bulbs, go both ways. The machines are employed by people, to do anything from setting alarms to ordering hair dresser appointments for us. As great as the futuristic visions of interface-less machines are, the reality can potentially lead to serious confusion. For instance when they give out nicknames in stead of calling emergency services, like in the scenario above.

Robots are speech impaired and they need a lot of help in order to help us, and even more help to avoid mishaps while trying to do so. That is to say, language is more advanced than we give it credit for, and Natural Language Processing (NLP) is literally easier said than done.

We use all our senses to process speech — turning a blind eye to irrelevant information is vital

Being human

When we hear spoken language, our brains automatically and immediately help us choose the most probable interpretation of the words, given the certain sound and situation. Secondly, we look past the fact that some words are ambiguous, e.g. call in the example above. Instead we go beyond — to catch the real meaning of the query. People don’t only recognise the language sounds and analyse structures in grammar, we naturally interpret and find the composed meaning in context.

When we talk to people, we take all kinds of human insight for granted — knowledge we have acquired by merely talking to others and expressing emotions and thoughts. For machines, the story is much more complicated. They aspire to produce human-like text that responds, senses, reasons, acts and adapts, in order to simulate human thinking — artificially.

For robots to talk, their computer systems need to train on a collected and managed set of data. The software can then generalise the data input and produce output as specified by a human programmer — e.g. “Okay, I’ll call you [whatever you said after Call me].” While they use smart, self-learning algorithms, the systems are not intelligent and does not learn and gain knowledge in the traditional sense. Before machine learning, there is machine teaching.

Teaching machines language all starts with letting them process text. Text technologies are a part of language technology called Natural Language Processing (NLP). NLP is a branch of Artificial Intelligence (AI) that uses software to generate and understand natural language (NLG and NLU, respectively).

How to order a pizza with words

Another example is this: “[Order (me)] a [large] pizza [with pepperoni & mozzarella|number 33]” consists of some straight-forward words that are hard to misinterpret. The verb triggers a function — namely to put in an order; there are some interchangeable variables that determines the size, and the topping ingredients or the menu number. If the words were replaced with “[Give me] a [small] [vegetarian] pizza”, it is still a grammatical sentence — computable for a pizza chatbot. Grammar and mathematics are not that different, so the machines follow us up to this point, before things get more complicated.

One thing machines and humans do have in common, is a talent for classification. Something is meaningful when it denotes some property of the world, e.g. vegetarian is in this case ‘pizza topping’. In Machine Learning, we employ algorithms to look for patterns, based on training data of the words that fit together, in the same categories that people use. The language models produce a numeric representation of the sentence and can translate it back into words.

The issue is, however, that words — unlike numbers, can refer to different things. Natural language, unlike programming languages, is characterised by its ambiguity. Words are vague or not meaningful in themselves, so a key part of language understanding is knowing what a name means — or rather, its references in a specific context. An NLP language model can therefore simply learn the text vegetarian and map it to the concept ‘ingredient’ or ‘person with a dietary restriction’, without knowing what it means.

Without help from a lot of input training data and supervised discourse processing, all the different interpretations of a sentence are plausible to a computer. A challenge for all types of data-driven technology is customising for the different types of domains and purpose of use. Whether the user wants to be nicknamed ambulance or perform an emergency phone call, is unclear to a machine that is not listening in to everything else that is happening (and we don’t want that scenario either).

Despite their capacity to store and sort information, humans still have an upper hand on machines in communication

Weather or not

If we really want to talk with machines, we have to share more than plain text. Every natural conversation is built around spatial experience, our common ground and social interaction. People that don’t have a lot in common, often talk about the weather — a common reference to an unambiguous experience. However, what we communicate is more than the words and how we combine them. It’s our tones, pauses, errors, corrections, facial mimics, body language and gestures that reflect our mental representation of language. That’s why very often, what we say is not the literal meaning of what we’re saying.

The subtle, artistic and creative nature of language is not found in the grammar itself, but in the references we share when we interact with each other. In other words, ideas and concepts are masked within feelings and memories, that the machines don’t have access to. That means you can say “Oh! that’s just great” and mean the opposite, or use a fixed expression “it’s raining cats and dogs” to convey that there is heavy rain. This works great as a cooperative game for people, but it makes language less mathematically computable.

“The meaning of a word is its use in language” — Ludvig Wittgenstein

Human language evolves constantly — quicker than the Machine Learning rules for NLP. The Internet is full of inconsistent text data from Twitter, or blogs with jokes and memes, that static code can not figure out. Since concrete text is all it receives, it’s safer to hard-code grammatical rules and supervise the machine’s language learning than to make it guess the meaning of general utterances. In the language games we play with people we know well, we use more references than we recognise. Imagine being a machine that has never experienced anything!

Our skill to assort, adapt and remember information, made us who we are as a species

We do it better

The goal of Natural Language Processing is that machines will understand language. Unfortunately, that will involve techniques that don’t yet exist. Machines are improving their logic and common sense. Some systems can even recognise context and the inherent ambiguity in longer language sequences. NLP has its success stories, and NLU such as spell check, writing support, automatic text abstracts are helpful tools. They should be seen as just that. Machine translation is a cool trick, but it can not replace human translators any more than a thesaurus can replace writers.

The more complex functions such as information extraction, sentiment analysis, dialog systems — for instance Siri, Alexa and the Google Assistant, should be seen as supplements, in the same way that a dictionary can help you find the right words. They are, however, learning numeric optimisation, not the social intelligence that is required in conversations. Natural language understanding is not intelligence, and speech robots with good Natural language generation still haven’t passed the Turing test.

Even if the speech robots learned our references to objects, people, experiences, and mimicked having a body, using eye-gaze, body language and gestures — they would not be affected by feelings and coincidence like mortals are. Machines will never be able to live in a human linguistic community and learn language as people do. They are more than welcome to try, though!

All illustrations by Nicolas Vittori

Related articles:

--

--

Vilde Reichelt
Bakken & Bæck

Linguist and UX writer – it’s all semantics to me.