Image for post
Image for post

Why is Siri still so clueless?

Joe Hootman
Jan 27, 2019 · 6 min read

AI is progressing at a fast and furious pace. Over twenty years ago, IBM’s Deep Blue beat Gary Kasparov at chess. Seven years ago, IBM’s Watson beat Ken Jennings at Jeopardy. Last year, DeepMind’s AlphaGo beat world champ Ke Jie at Go. And just two days ago, moving to the very apex of civilization’s ladder, DeepMind’s AlphaStar walloped two of the best human StarCraft II players.

An inside-the-mind peek at AlphaStar’s defeat of one of the best human players of StarCraft II. Are Ninja’s days numbered?

Despite all this progress, your day-to-day attempts to interact with Siri, Alexa, or Cortana often result in forehead-smacking frustration. There’s this classic:

Image for post
Image for post

Several iOS versions later, you’ll still get results like:

Image for post
Image for post
You: “When is black Friday?” Siri: “It will happen Thursday.”

If you harbor any doubt that it’s maybe just you and the way you’re phrasing questions, the existence of an entire subreddit should clear up the fact that an AI may be able to beat you at chess, but it can’t yet map its way to your daughter’s ballet recital.

Jonathan Mugan of DeepGrammar spoke at Data Day Texas today about the history and current state of attempts to move natural language processing forward into a more generalized AI that can answer questions a four-year-old can, like: “why can you pull a wagon with a string, but not push it?”

Image for post
Image for post
Jonathan Mugan

I enjoyed Jonathan’s talk because it threw historical illumination upon some of my graduate work and because it raised questions about what we mean we talk about the goal of getting AI to “understand language.” Here’s a Medium-sized summary of his talk suffixed with a keen observation of the history of the field he offered during Q&A. (Any errors are mine, and corrections are welcome.)

Historically, the NLP quest for understanding, he noted, has moved along two paths: symbolic and sub-symbolic.

Image for post
Image for post

The symbolic path transposes words into symbols and attempts to map relationships between those symbols. In its earliest phase, this approach chopped up all the words of a book into tokens and threw them into a bag (like a salad), shook them up into vectors to perform calculations based on their frequency, and, if feeling creative, seasoned them from lists of external meanings representing sentiment.

Following it were attempts at manual representations, which struck me as the left brain’s Spock-like hunger to know the right entirely on its own terms. The meaning and relationships of words should be able to be buttoned-down into a comprehensive taxonomy of parts and wholes, parents and children. This was the fecund Age of Ontology, bringing forth the ambitious Semantic Web, WordNet (where I did some work), FrameNet, ConceptNet, the pragmatic Wikipedia-based YAGO (Yet Another Great Ontology), SUMO (Suggested Upper Merged Ontology), and image schemas.

Image for post
Image for post
Semantic Web Layer Cake Spaghetti Monster. This’ll make sense out of language!

Taking a step toward simplicity, the world models approach understood that people communicate from the basis of an assumed base model of the world, and only bubble up pertinent changes in state when needed. So the focus turned to creating those models and the dimensions of their changes: probabilistic, relational, concurrent, temporal.

Having put world models through their paces, the field then enhanced them with merge representation. A word alone could denote thick representations of ideas like coverage (“a roof covers the house”) but then be supplemented with inferences from the world to answer questions like “why does it only rain outside?”, leveraging world models about how rain falls and the functional boundaries of “outside”).

Image for post
Image for post
A representation like “chicken” can be supplemented with meaning inferred from the broader world model about what birds are like, how farms operate, or how to compliment waffles.

The second path forward in NLP is the sub-symbolic path, which forgoes the utopian vision of translating linguistic meaning into discrete, manipulable symbols. Instead, it takes a much more pragmatic approach of working with words as they function in actual language by running them through ML techniques, most notably neural networks. word2vec works from the assumption that words are grounded in experience and creates two random, dense vectors for words until they are adequately differentiated and produce a coherent internal structure. These vectors have resulted in the discovery of concepts like “captial city” where vectors for Italy and Rome can be applied to produce outputs like France and Paris.

seq2sec expands beyond single words to encode sequences of them, like in sentences, into vectors which can then be decoded into other sequences. It is used heavily in machine translation. Question Answering uses recurrent neural networks to answer specific questions so that it can learn which facts merit attention to answer other questions. By answering earlier questions about spatial location, this approach can produce results like:

The office is north of the yard. The bath is north of the office. The yard is west of the ktichen. How do you go from the office to the ktichen? A: south, east.

Finally, the sub-symbolic approach discovers meaning by recognizing that we are not disembodied brains, but that we exist and learn within the physical framework of our bodies and our given environment.

Image for post
Image for post
Sorry, Descartes. But no dice.

Therefore, in order to learn the meaning of language about touching, holding, or grabbing an object, we need to capture the sensations experienced by touching, holding, or grabbing. The low-hanging fruit here is DNNs created from interactions with the external world that are easy for a computer to undertake, like playing a text-based adventure game or ingesting input from the interface of a computer game (this is one of the key approaches AlphaStar used to triumph in StarCraft II this week). The likely gains will first come from the same species of the data ingested. For example, Jonathan talked about efforts to use video game data to create a “Grand Theft Auto world (with hopefully less violence).”

Physical reality presents a much more complex challenge for ingestion. To address that, the EU has developed an open-source project called iCub, which constructs a humanoid robot and captures the data from its interactions with physical reality.

Image for post
Image for post
iCub meet “plant.” Plant, meet iCub.

The external world training approach looks to me like the one that is pregnant with the possibility (and probably necessary) for the growth of virtual reality.

In conclusion, Jonathan re-iterated that the symbolic and sub-symbolic categories of approaches are designed to help AI journey from language to understanding.

Image for post
Image for post

This raised a nagging question for me about epistemology: what exactly are we reaching for when we talk about a linguistic “understanding?” Theologians and philosophers have batted that question around for centuries. At one point in the talk, Jonathan suggested that the field’s current end goal was to obtain a grasp of meaning from language that was richly-grounded in the human experience gained from sensation and action, which sounded like a pretty empiricist definition to me.

When I asked him about how varying definitions of deriving “understanding” from language affected the approaches to the quest for its attainment, Jonathan offered a very pragmatic, realistic answer: it is whatever makes sense and is the most commercially viable at any given moment.

That fits in with the most of the history of science, which does not “progress” in a linear abstract vacuum but is often characterized by the interplay of varying approaches led by personalities and funding opportunities. I thought about how a recent article from MIT Technology Review confirmed that truth. It examined the history of recent scholarly approaches to “artificial intelligence” in order to predict the future of its development, and found no grand inevitable progression from one step to the next. Instead, a particular paradigm seems to reign for about a decade and then it is superseded.

I wonder what approach the market and idea-creators like Jonathan will settle into for 2020 and beyond. Maybe they’ll discover something that Siri will finally make note of.

Data Driven Investor

empowering you with data, knowledge, and expertise

Sign up for DDIntel

By Data Driven Investor

In each issue we share the best stories from the Data-Driven Investor's expert community. Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Joe Hootman

Written by

Diving into oceans of data to discover pearls that help you make wiser decisions. Predictive data analyst, machine learner, data engineer. Disciple in Austin.

Data Driven Investor

empowering you with data, knowledge, and expertise

Joe Hootman

Written by

Diving into oceans of data to discover pearls that help you make wiser decisions. Predictive data analyst, machine learner, data engineer. Disciple in Austin.

Data Driven Investor

empowering you with data, knowledge, and expertise

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store