AI’s Don’t Understand (Yet)

The software industry is infatuated with artificial intelligence. Billions are invested in research, hundreds of startups emerge, yet despite the emphasis, we remain challenged to answer the most fundamental questions:

When does a computer actually understand something?

Dictionary and Encyclopedic Knowledge

If you ask someone if they know what something is, they’re likely to give a brief description that resembles a dictionary definition. The questions are usually shaped as ‘what is X?’ and the person responds ‘X is Y’. Dictionaries often contain additional information like ‘part of speech’, morphology (plural forms, tense, affixes, etc.), syllables and pronunciations. Specialized dictionaries extend the knowledge into practical or domain specific uses. For example, a thesaurus identifies synonyms & antonyms, a legal dictionary focuses on the ‘legal domain’, an idiom dictionary on idiomatic expressions and so on.

Similar to a dictionary, encyclopedic knowledge draws relationships between concepts. Just as the dictionary focuses on the “Is” relationship, encyclopedias focus on a handful of similar relationships such as:

- “is-a-kind-of” (a tiger is a kind of animal),

- “is-a-part-of” (a steering wheel is part of a car),

- “has-a” (a man has a wallet) and

- “has attribute/adjective” (a bear is brown).

These relations focus on nouns (people, places & things) and how they relate to each other. The verb-focused relations are usually in the context of a noun. Examples include:

- “used for” (gun is used to hunt) and

- “capable of” (a cat is capable of jumping).

Action and event information capture data on time and sequences,

- “precedes” (eating dinner precedes cleaning dishes),

- “has sub-event” (cleaning dishes has sub-event of loading the dishwasher)

- “has prerequisite” (car must be on before you can drive it), etc.

Similarly, there are relations to capture common locations of items,

- (birds located near trees) or (fish are located in water).

Discourse relations analyze if two events occurred in parallel, or one before the other. These clues help the computer to understand the timing of events. NLP systems that perform ‘reading comprehension’ are adept at interpreting prepositions:

- In, on, at, next to, above, below (indicating the location)

- After, during, before, while (indicating the timing)

Other keywords help us to understand why something happened:

- Because, because of, due to, since (reason or motive)

And other keywords help us to understand how something was achieved:

- By, via (what instrument or method was used)

These mini-facts (or propositions) are captured in downloadable semantic databases such as YAGO and ConceptNet with their knowledge often obtained by reading and parsing Wikipedia or other online sources. The complex relations are often stored using a graph database or some sort of hypermedia URL centric method. This data is combined with commonsense factoids like ‘the girl sneezed, hence, she is alive’ (entailment relations). Large entailment databases are used to complement other knowledge acquisition techniques.

Current State: ★★★★☆

Microsoft, Google and others host knowledgebases in the cloud. The databases often contain over 100 million facts.

Context & Contribution

As computers read and gather knowledge, they extract nuggets of important knowledge but often at the expense of knowing the bigger picture. A computer might determine that, ‘people use dishwashers’ but not realize that the proposition is likely to be false in a 3rd world country that has no electricity. The computer might know that an automobile ‘has a horn and an engine’ but not comprehend their contribution to the ensemble. The information extracted didn’t consider the important details around the context: where did it occur? When? For how long? Why? How? What’s the backstory??

As events occur and knowledge is obtained, humans associate items together. We associate items in space, in time across common attribute or object. Both a weakness and a strength, we create prototypical templates of our world. We stereotype entity classes as part of our baseline. Surely, this is key to reducing the number of instances we must remember, and reduces the processing time for logical reasoning. And yes, it increases the chances of making mistakes based on categorizing group members by a feature-set.

Time/space frequency data enable us to spot that which is out of the ordinary. Should a woman be playing a piano underwater, or is that odd? Do blueberries normally glow in the dark, or is that odd? To understand something requires understanding if it’s possible, and if it’s common, or likely.

Current State: ★★☆☆☆

Modern AI fails to adequately capture the probability of scenes in a context, however, this too is within our grasp.

Recognize, Compare & Contrast

If I describe a, ‘a crescent shaped yellow fruit’, you might guess ‘banana’. An agent/person recollects attributes (imagery, prosodic, haptic, etc.) and compares them to a baseline. Humans are extremely good at comparing things. Show me a picture with just one anomaly and my attention is immediately drawn to it. Our core intelligence allows us to loop through sets of information and to ‘diff’, that is to find the difference, between similar items. We sample lots of data, form a baseline and then we compare our new piece of data to the baseline. A proper baseline enables us to give examples and counterexamples. Our AI should be able to rattle off examples of ‘situations that scare children’ or counterexamples for ‘CEO’s are greedy’. To understand something, requires that you understand it relative to something else.

Current State: ★★☆☆☆

Deep learning and neural attention models are showing great promise for classifying and comparing items but often lack disambiguated precision and context.

Interrogate for Importance, Prioritize Accordingly

To understand something, we need to know not only ‘how’ it’s special but ‘why’. If it differs from others in its class, why is that difference of any importance? Our comparison engine might tell us what is significant about a giraffe but not tell us why. Sakichi Toyoda (founder of Toyota), suggested that we need to ask ‘why’ up to 5 times to get to the heart of a matter. Why is the giraffe’s neck so long? So that it can eat leaves from higher branches on a tree. Why is that important? If there’s a drought and limited food, the animal can eat. Why is that important? Living beings have a survival instinct, and so on.

If you ask another ‘expert’ about giraffe’s neck, they’ll tell you that it relates to combat and sex. Male giraffe’s use their neck to spar with other males, the victor gets mating rights. Now, we have competing theories. The computer that contemplates ‘giraffe’s neck’ contextualizes it around multiple possibilities. After all, ‘The mark of an educated man is the ability to entertain a thought without accepting it as truth.’ The essence of understanding seems to mandate an element of doubt or speculation about an incomplete truth. To understand something requires that we admit we don’t fully understand it. This seemingly paradoxical relationship indicates understanding is a fuzzy concept with no absolutes. There is no such thing as ‘complete understanding’ yet we can’t claim an understanding until we’ve invested in the unknown.

Intelligent entities including computers have limited resources. Using those scarce resources carefully, based on priority becomes essential. The person (or computer) that fails to understand the significance and priority of something is likely to be beat by one that does. It’s Darwinian. AI agents should prioritize their effort to understand something based on the ‘reward of knowledge’ and ‘risk of ignorance’. Understanding is therefore, relative to itself/yourself at a different point in time, and it’s relative to another agent/person.

Current State: ★☆☆☆☆

Modern AI primarily focuses on answering ‘what, when, how’ type questions; asking and answering ‘why’ requires contextualized knowledge and scalable reasoning engines.


In the movie ‘Good Will Hunting’, there’s a scene with Sean (the psychologist played by Robin Williams) and the lead character (Will Hunting played by Matt Damon):

“So, if I asked you about art you could give me the skinny on every art book ever written… Michelangelo? You know a lot about him I bet. Life’s work, criticisms, political aspirations. But you couldn’t tell me what it smells like in the Sistine Chapel. You’ve never stood there and looked up at that beautiful ceiling. And if I asked you about women I’m sure you could give me a syllabus of your personal favorites, and maybe you’ve been laid a few times too. But you couldn’t tell me how it feels to wake up next to a woman and be truly happy. If I asked you about war you could refer me to a bevy of fictional and non-fictional material, but you’ve never been in one. You’ve never held your best friend’s head in your lap and watched him draw his last breath, looking to you for help. And if I asked you about love I’d get a sonnet, but you’ve never looked at a woman and been truly vulnerable.”

I believe that I understand giraffes but I’ve never touched one, fed one, smelled one or witnessed the birthing process. Encyclopedias fail to communicate the emotions evoked or sensations caused. To understand a simple action like “run” might give rise to memories of a long painful run, sweating, overheating, being thirsty and yet being proud of the conviction to endure. Firsthand experience is essential to a deep understanding but an analogous experience is also useful. When a person reads about someone climbing a mountain, a task they’ve never done, they might remember a similar stimulating occasion and evoke an emotional & cognitive recollection. Firsthand experience is preferred but analogous experiences and secondhand accounts fill in the gaps.

When we experience some event, we take in small details. These tiny observations are often so small that if asked about the event, we wouldn’t report on them. Knowledge management professionals use the term ‘tacit knowledge’ to define the type of knowledge that is difficult to describe, write down or communicate to another. The zookeeper who handles a giraffe knows the proper distance to keep from the animal but if asked how many meters it is, they don’t have an exact answer. Understanding seems to call on our unconscious mind to track the indistinct details.

Current State: ★☆☆☆☆

Experience comes with age and activity. More investigation is needed on AI services that run for a decade or more (like Never Ending Language Learning) and limited autonomous AI’s with multimodal sensors (cars, robots, etc.)

Studies, Experiments & Outcomes

Surely there’s a difference between an agent/person that experienced something a handful of times and one that pursues the study as a lifelong journey. The engineer, author or professor that practices their profession approaches the problem more times, and usually in more structured ways. They’re more likely to consult experts. Conclusions are often formally proven, results are published and examples provided. Understanding no longer rests on historical data; understanding is an ongoing task. Ultimately, the researcher demonstrates understanding by moving a concept from idea to application. Perhaps the pinnacle of understanding is the ability to use accumulated information to affect a future outcome. To know, is the ability to control, or at least the ability to know the controllability.

Current State: ☆☆☆☆☆

Our AI thinks about what we tell them to think about. AI agents that build and run their own experiments are nascent at best.

Grounding Knowledge & Communicating

If you ask me how long a meter is, I’ll hold out my hands to indicate a length. If you ask me how long 3 seconds are, I might perform the American child ritual of saying, “One Mississippi, Two Mississippi, Three Mississippi”. Humans ground our knowledge; new information is recorded relative to the initial point. Our language is filled with scalar information (freezing, cold, lukewarm, warm, hot, etc.), comparative and superlative language (tall, taller, tallest), valences (sad, gloomy, depressed), intensities (yellowish, lemon chiffon, canary yellow), and so on. The significance of something becomes apparent as it is placed (and moved) on a scale. Much of modern AI attempts to gather knowledge as a series of absolute truths, rather relative truths which inhibits transfer learning.

The human model of ‘relatively grounded knowledge’ has been proven to scale. We aren’t born knowing the length of a meter. Our mental models are then communicated to each other using a vast vocabulary. Although our language is full of ambiguous terms, we manage to convey our understanding between ourselves and use a form of ‘correction and error handling’ to resolve the differences. In this way, understanding becomes something that a single agent strives to achieve, but also something that an AI agent can communicate. Humans use procreation and inter-generational communication to ensure continuity and advancement of our species. This model promotes ‘no single point of failure’ and acts as a filter of information as data is passed between generations. As a human, I don’t look forward to death, but it sure is a brilliant model.

Current State: ☆☆☆☆☆

Renewed work in ‘curiosity exploration’ as a complement to RL looks promising. AI agents that build and run their own experiments is nascent at best as is inter-agent communication.

Going Forward

The software industry has made great strides in artificial intelligence. However, a disproportionate amount of our recent effort has been fixated on a single technique (deep learning) with similar fixation on the problem domain (classification).

I believe that ‘computer understanding’ is a tractable problem. There will always be tricky areas like ‘does a computer understand life or morality’, but then again, humans remain challenged on such problems. To achieve the next level of intelligence, a convergence of complementary techniques will likely be required. The framework I’ve presented is meant to be used as inspiration. It’s neither fully accurate in coverage or detail but hopefully will encourage the AI learner to continue their quest into new and exciting areas.

Jeff Schneider is CEO at Legendary.AI, an early stage venture focused on machine reading, knowledge acquisition and human-to-machine communication.

Like what you read? Give Jeff Schneider a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.