Conversational AI ‒ but where is the I?

HAL 9000 ‒ 50 years later…

I remember the first time I saw a computer, it was a Power Macintosh 5260 (with Monkey Island on it). I was around 5 years old and I looked at it as if it belonged to another universe. It did, I was not allowed to get anywhere close to it within a 5 mile radius; it was my older brother’s! That did not stop me. I browsed it for hours. The possibilities of computers were infinite and fuelled by the inspiration of sci-fi worlds the dream of talking machines, machines that can assist humans, think themselves and even have feelings never stopped. I kept dreaming about the possibilities of the future.

Fast forward 20 years. We saw great leaps in technology: powerful silicon chips, the rise of the internet and, of course, the end of a long AI winter with the dawn of the era of Big Data and Deep Learning. The unreachable future that was beyond comprehension is starting to become a reality : flying taxis, “Terminators”, Intelligent Agents, just to name a few things that seemed unattainable. However, machines that can speak, reason and think on their own and which are a core part of any future cannot be seen. This state of affairs and my dreams from a young age kept pushing me. Two years ago I embarked on my most daring adventure. I joined the journey to help shape the future and build true conversational AI ‒ a journey I am not treading alone ‒ thanks to my team at Wluper.

Conversational AI is a big word these days. The Siris and Alexas among us are rising. The CES 2019 in Las Vegas was dominated by voice. But while they are becoming ubiquitous and in ever more devices, we ask ourselves: where is the intelligence? “Alexa, please flush my toilet?”, really?? Deep learning is the magic secret sauce behind many of these incredible advances. There is no doubt about it, the advances of deep learning were phenomenal, they pushed conversational AI and NLP to a new frontier and yet I agree with critics such as Gary Marcus that are skeptical about deep learning. But, I am not here to criticise deep learning, I am here to make a stand and call on people to be critical, to ask the hard questions and not to be satisfied with the mediocre ‒the future should be grand!

Turing Complete or “Turing Exhaustive”

Intelligence, in general, is a hard concept to define, it can mean so many things, it has so many layers of understanding and discussion to it. I will not get started on that. A good starting point for intelligence in natural language, however, is Turing Completeness, the ability to understand and execute any statement (program). We humans are arguably Turing complete, kinda, bar the finite memory and time and random (and sometimes stupid) transition function. Overall this means that we can express any logical expression to one another and are able to “understand” and “execute” it. This is how we can describe the world around us. This is how we can argue about Brexit, Hawaiian Pizza and EMACS vs VIM (undecidable?), although I would call some unnamed people Turing Faulty to say the least!

Now Microsoft, Google, Amazon, Apple have billions (BILLIONS!) of devices or as they call them: “IPDAs” (Intelligent Personal Digital Assistants). Intelligent? Why is nobody talking about Turing completeness in dialogue systems and even less so for IPDAs? Even Minecraft is Turing complete. Hey, but wait, you can have a while loop with Google Home, Siri and Alexa.

Let me introduce ‘Turing Exhaustiveness’, a neologism, in which today’s intelligence is achieved by covering every possible eventuality with a bespoke rule or action. We have all seen it: Intents and Actions! Not only Alexa and Google provide them, many other pioneers and players do so as the core building block of voice assistants. The idea, as we see it, must be the following: describe every permutation of possibilities with a standalone action, every possible human interaction with a standalone rule to achieve true intelligence! Clearly “Turing Exhaustiveness” will never lead to Turing Completeness!

I am better than you (on average) ‒ or what SOTA means

But what about Machine Learning?! Yes, Machine Learning! Machine Learning and Deep Learning are taking care of writing rules. They are outperforming the rule based and industry dominated approaches. Hurrah! But wait, let us have a second look at what state-of-the-art (SOTA) means.

These models are evaluated on things like F1 score, exact match, accuracy. SOTAs are established by outperforming everyone and everything out there - on the average score. Average?! Not a score on every use case. But are we humans interested in averages over infinite time horizons? I don’t think so. It doesn’t matter if your system gets stuck in some use-case, or that your Google Home is calling the police and you cannot make it stop as long as you’ve set the new SOTA.

So, where exactly is the problem? What are we humans doing differently? We can easily deal with ambiguity, for one thing. Language is inherently ambiguous . Just think of “Mac”. Do I mean an Apple mac? Or maybe the makeup company MAC? Or am I really hungry and thinking of “Mac’n’Cheese”? Currently the best solutions out there try to solve this with confidence thresholds or black boxes. Humans, on the other hand, have interactive dialogues: we ask clarification questions and seek true understanding, SOTAs alone are not gonna cut it. A fundamental shift in natural language understanding is necessary to get us closer to Intelligence ‒ superficial results will only draw the wrath of Yoav Goldberg.

More Data, Give me More Data ‒ or How Everyone is Doing Shallow Understanding

Data is important for Machine Learning, nobody will have it any other way , but data alone is not gonna solve our problems. One can have all the data in the world and nothing good will come out of it , especially if one needs annotated data. The problem is that natural language understanding, dialogue state tracking, etc. is done in a shallow way with our beloved deep learning models. The models are performing perception similar to machine vision models, which recognise that there is an apple in the picture, but they have no clue what an apple actually is. The same happens in NLP, current models just perceive that a location, a person, an action was uttered in the sentence and then the system reacts, but it doesn’t “understand” or “know” what is going on.

More data, deeper models and hyper-parameter tuning (such as random seed) are usually the go-to tools of NLP practitioners (if not whacky rules). And sometimes there is a leap, such as Google’s BERT model. But, fundamentally this is still perception . Understanding whether “somebody can still catch a bus”, doesn’t happen via superficial perception of the word “catch” or “bus” ‒ rather an understanding, a model of the world and the user, needs to exist. Certainly, the user doesn’t mean catching the bus as superman might do. More data is not the solution.

Even with the best data collection, with the most advanced deep learning models, all that is happening is mere perception. Defining what understanding and meaning is, is not a part of this blog post, as I believe that due respect needs to be shown to centuries of philosophers who’ve dedicated their time to that issue. Instead, I will just limit myself to saying that the “I” will not appear by building bigger models trained on more data. I cannot agree more than with Kevin Gimpel’s quote from Sebastian Ruder‘s post:

“I think the biggest open problems are all related to natural language understanding. […] we should develop systems that read and understand text the way a person does,
by forming a representation of the world of the text, with the agents, objects, settings, and the relationships, goals, desires, and beliefs of the agents, and everything else that humans create to understand a piece of text.
Until we can do that, all of our progress is in improving our systems’ ability to do pattern matching.”
– Kevin Gimpel


Machine Reasoning needs the hardest of questions answered, it needs courageous (or crazy) people. We are an underdog, we are just getting started, and we are here to solve these hard problems. Turing-exhaustiveness, shallow understanding and data hungry models are our enemies. We, at Wluper, are critical of the approaches out there and of our own. We believe in working on the fundamental problems in order to achieve true conversational AI ‒ do you?


P.s.: Keep looking out for more, especially for some of the answers ;-)

If you liked this article and want to support Wluper, please share it!

Follow me @ai-nikolai and us on Twitter @wluper_

If you want to work on Conversational AI, check our careers page.