Conversational interfaces aren’t new, but they’re changing the game

It seems like everyone is making their service more human in 2016 by creating a bot, or at least a conversational interface, so you can just talk to computers the way you do with other people.

Chat apps have let us start interacting with apps, businesses and other services by just messaging them — now, for the first time you’re able to message companies like KLM to find out what went wrong with a flight, or make a change to your reservation.

What might surprise you is that they aren’t new at all. We’ve been using them — or at least trying — for decades, and that the very first conversational interfaces surfaced in the 1980’s and 1990’s — much of the time in games, and as an early method of text input.

Talking to computers

Why are we suddenly seeing an explosion in conversational interfaces? It seems like all of a sudden you’re able to get help and consume information just by messaging bots — but just a year or two ago that seemed ridiculous.

The truth is, over the last two years, it’s been a time of intense competition and innovation. For the last two decades at least, since computers went truly mainstream, the world has been obsessed with designing the perfect visual interface. How can we make our apps help users, and teach them how to get what they want to do done faster?

Graphical User Interfaces, often referred to as GUI for short, were the gold standard in a world where pointing and clicking was the primary interface. If you’re sitting at a desk with a mouse and keyboard, it makes sense to show you something visual to guide you about your way.

With the advent of the iPhone — and smartphones as a whole — that trend continued, but in recent times as design has become simpler, flatter and less complex, that changed.

People also realized that they actually didn’t need all that many apps. On average people are downloading fewer apps than ever — the average phone owner now downloads zero new apps each month. Even worse? People spend more than 85 percent of their time in the most popular apps.

People discovered, after almost ten years after the modern smartphone’s birth, that having a ton of apps cluttering up a phone isn’t useful and just added to cognitive load: where do I go to get the thing I want to do, done?

For developers, one of the hardest things is now getting people to download their app: it’s difficult and expensive, especially in a world where most of us just spend our time inside Facebook and Snapchat.

Enter conversational interfaces. All of a sudden, they’re everywhere — every messaging app has one, the major mobile players are releasing voice assistants inside speakers and more.

Conversational interfaces first blew up in Asia, now in the rest of the world — and they show no sign of stopping. Why is a simple conversation so compelling over visual design?

As it turns out, it’s often actually harder to do something visually. Ever needed to make a change to your flight reservation, only to click 12 times through eleven different screens and eventually give up, then call an actual human?

Yeah, we all have. Now imagine that, but you could just say what you wanted to do in the first place: “I want to change my flight.” A few seconds later you get a reply: “OK, when would you like to leave instead?”

Conversational interfaces are interesting not only because of the way they look or feel, synthesized speech or even voice recognition: it’s an intelligent interface where all you do is type, and the computer understands what you say, regardless of exactly how you word it.

They first came to us through software like Siri, which requires you to speak your request out loud, then you’ll get a response. In 2016, now there’s multiple ways to interact. If you use Google Allo, you can pull the company’s assistant into any chat thread by just mentioning it and do a search. If you have a Google Home, the very same assistant can be used by your voice anywhere in the home to track down packages, or find out how your calendar looks today. Facebook’s M can solve your every wish without ever leaving Messenger.

There are hundreds of these assistants, most of which didn’t even exist a year ago, already changing the game in mobile. Conversational interaction sounds like science fiction but this is the reality we live in right now — and it’s already improving the way we interact with computers in immeasurable ways.

How we got here

One of the earliest conversational interfaces — also known as a natural language processing computer program — was created in the 1960’s. It was called ELIZA, and simulated a conversation with a human by using a method called pattern matching, which gives the illusion of understanding, but in reality is a rather simple trick.

ELIZA was a basic conversational interface that ran on the command line, and a script called DOCTOR allowed the user to interact with the computer as if they were talking to a psychotherapist — who would then try to diagnose them.

ELIZA combined with DOCTOR was so effective at the time that it actually fooled some of its earliest test subjects into believing it was human. It might have been a trick, but these types of interfaces were invented, quite literally, fifty years ago.

Back then, a conversational interface was something of a science fiction pipe dream. In the 1983 movie War Games, David talks to the computer by just typing what he wants — but accidentally finds himself playing real global thermonuclear war instead of a game.

Playing a “game” in the 1983 movie War Games

When the graphical user interface (GUI) was invented at XEROX Parc in the 1980’s, interest in conversational interfaces waned a little, given you could just show the user what you wanted them to do and make them click it. It was much easier, and friendlier, to show someone how to use a computer visually and guide them through using it that way.

Before then, a command line seemed imposing: a simple flashing white cursor that could in theory do anything, but it was hard to train the user how to use it. How would they know what to type? The GUI changed everything — and the world embraced it vigorously.

It wasn’t until the web’s meteoric rise, and the launch of smartphones, that they were revived: how, exactly, could you fit an entire set of services or a website onto a tiny screen that fits into your hand? That isn’t easy, and many arbitrary concepts, like a time or place, are hard to illustrate at all.

The CUI has another huge advantage over a GUI: It can allow people to talk about hypothetical objects or future events that have no graphical representation.
-WIRED, 2013

There’s another key reason that coincided with smartphones, making conversational interfaces much more feasible: natural language processing finally reached a point where computers are able to more accurately guess the user’s intent, and not leave them confused, or with a bunch of dead ends that left them hanging.

Inventions like Siri, Nuance’s Dragon software and Google Assistant displayed to the world that you could talk — out loud — to a computer and it’d understand you enough of the time to be useful.

We wrote about that technology in an earlier part of this series, but the key takeaway is this: computers finally got good enough at teaching themselves about the world through a technology called neural networks.

Neural networks require immense resources and the amount of computing power needed wasn’t feasible until recently — with the advent of cloud computing’s vast amount of available computing power through platforms like Amazon Web Services. This technology takes data coming in, and uses algorithms to train itself to understand context, and the wider world.

At its core, even stripping away neural networks, artificial intelligence and other buzzwords, the reason we’re able to build convincing conversational interfaces in 2016 is because computers are simply powerful enough now — they’re a product of Moore’s Law.

“The computational resources required for a single Siri query is in excess of 100 times more than that of traditional web search.”

Even better, we’re only at the tip of the iceberg. We’ve just figured out that people need fewer confusing apps than more, and even though the technology isn’t quite perfect yet, every dead end is actually another data point to learn from. If the bot gets stuck, that’s recorded, and learnt from.

Conversation, for better or worse

The question, with all this in mind, is when is a conversational interface actually appropriate? The answer is, unfortunately, an excuse: it depends.

A bad conversational interface would be a complex application that’s hard to master or requires complex tasks to be completed. One of the primary problems with conversational interfaces is that it’s really hard to tell the user what’s actually possible.

In theory, your app can do everything! But there’s an issue: it’s only as good as your natural language processing, and even then, you’re going to reach dead ends frequently, as the Google Home team has discovered:

“Google Assistant will perform to expectations if you ask it to book a table at a Mexican restaurant near you. But if you ask it for a table at “one of my usual places,” you’re taking a Thelma and Louise drive into the Flummoxed Valley.”

These types of questions are natural to us, but for computers that’s an infinitely complex question: Where are your usual places? Is that near the office or home? What time of day is it? Are we talking about dinner or lunch?

The problem is that it’s easy to expose the shallowness of conversational interfaces right now, but that’s going to improve faster than any of us expect. It just takes time.

Another major issue is that people don’t actually tend to listen, whether it’s a bot or a human. Talla, a conversational bot company, observed that if a person was to ask a question and start a task, but then decide to ask something else instead the computer would be confused, and would get stuck expecting more information on the first task, rather than the second.

User: Add a task due next Friday
Talla: What is the name of the task?
User: Oh actually, it should be due Saturday
Talla: I’ve added the task “Oh actually, it should be due Saturday”

People don’t tend to talk in a logical, straightforward structure all of the time. We get distracted, change tact and even sometimes just want to talk about something else. Talla said that these type of problems are “the non-conversational UX equivalent of observing a user click random buttons in a UI.”

Slack, however, is a fantastic example of a smart, nuanced conversational interface that not only helps you learn about the app you’re about to use, but gathers information from you without it seeming cumbersome or confusing:

“Hi, Slackbot here! To make things easier for your team, I can set up some details for you. What’s your first name?”
“OK, Owen, Got it. Would you like to display your phone number on your profile? If so, enter it now, otherwise just say no.
“Congratulations! Your profile is ready.”

That’s easier than trying to teach the user where the ‘edit profile’ button is, showing them where to fill in the information, then hoping they’ll save it. In just a few seconds the user knows exactly what to do: because they’re used to just talking, rather than clicking around.

Facebook also showed the world what its idea of a conversational interface looked like in 2015 with the launch of Messenger platform. Companies are able to build bots that ask questions, but present the user with example responses they’re able to give in the form of buttons, right in the thread. That way, if they don’t know what to type — or just have no idea what to do — at least something is right in front of their face.

A List Apart wrote in early 2016 that the key to making a conversational interface actually click with your users is discoverability, and ensuring they’re never left to their own devices:

“But when talking to a robot, you’re just staring into a void. It’s the robot’s job to seize every opportunity to suggest the next step and highlight less-familiar features.”

As you’re building a conversational bot, it’s important to think about how it’s able to gently nudge the user along. Are they able to do something else at this point? Make sure to suggest it to them — otherwise you’re going to end up leaving them stumped, and the conversation is over.

The opportunity is so immense that Google, the world’s largest online search company, has pivoted its entire company to focus on conversational interfaces more than any other medium. Conventional search is dead, and Google wants you to interact with it directly on a more personal level:

“We are evolving search to be much more assistive [and] want users to have a two-way ongoing dialogue with Google to help get things done in the real world. We think of this as building each user their own individual Google.”
 — Sundar Pichai, MIT Technology Review

At the core of conversational interfaces is this: how can you take an overwhelmingly complex task, and boil it down to a simple chat? It’s possible for almost any product or service, but it needs to be implemented in the right way.

Conversational interfaces offer a range of benefits to your users: immediacy, ubiquity, authenticity and the feeling of connection, and that’s unlike anything that’s been available in the past — which is why everyone from search engine giants to small startups are clamoring to build their own.

Ten years ago, you could make a hundred toolbars with tiny buttons to get a simple job done, but with conversational interfaces it just takes starting with a question: “What do you want to get done today?”

This article is a part of the “Do you speak human?” lab — enabled by SPACE10 to explore conversational interfaces and AI. Make sure you dive into the entire publication.