The Conversational Interface and the Rise of Dumb AI

With the enormous amount of hype around chat bots, and the launches of bot frameworks and APIs from Microsoft, Facebook, Slack, and Telegram, the rush to be an early mover has placed being involved before having a reason to be involved. Much like web sites of 1996, Facebook pages of 2007, and iPhone apps 0f 2009, this has created an enormous proliferation of experiences, some of which will delight users, and some of which will cause frustration and yelling at one’s phone.

Dan Grover of WeChat illustrates the difference between these experiences more eloquently, with more expertise, and in far greater detail than I ever could, but I’ll do so with more brevity and animated gifs. Plus, I’ll also coin Feldman’s law — all communication systems expand until they include shopping — in this post, which, had anyone ever heard of me, would make it very important.

It’s about ease of use, not chat

The purpose of conversational interfaces is to lower transaction barriers. That’s their sole function. The fact that such interfaces first appeared in chat clients is less a function of the chat clients themselves, and more of the mobile-first nature of many asian markets where they broke out, the near universality of specific chat platforms across those markets, and in some cases the lack of an equivalent open web to what exists today in the US and Europe. But to understand why these were successful, and many other will not be, it’s necessary to dive into what a conversational interface actually is.


Many interfaces are conversational, but most don’t involve conversations

At a high level, we can break down CIs into three broad categories. They are:

1. Direct — let’s chat with our friend the bot.

When done right, these provide a streamlined interface to a 3rd party system from within another client. Dan Grover’s piece goes into this in detail, but the important takeaway is that they effectively provide app-like functionality or perform as intelligent agents. Put another way, they deliver a fairly robust in-chat-stream experience, or they use contextual information to make smart guesses about intent.

Some examples:

“@expedia book a flight to sfo”

Expedia doesn’t know much about what you want, and there are a lot of options.

Good response [a mini-app experience, where you can select flight times, airline preferences, prices, etc., with SFO pre-selected as the destination. After determining what one wants, payment is automatically deduced because the chat client has my credit card.]
Bad response: “Hi! I”m Expedia. Can you tell me what dates you’d like to book the flight on?”

or transferring money to repay a friend

@square pay Ben Brown $10

Square knows exactly what I want, and can use a number of different aspects of the platform to make my experience easier.

Good response: [Ben is a friend on this chat client. We’ve looked up that users account, matched it to a user in our database, and have transferred the money] “Done. $10 credited to Ben’s account”
Bad response: “Hi! I’d love to help with that. If you tell me the email address associated with Ben’s Square account we can get started.”

Howdy, coincidentally by said Ben Brown, is another example, as it behaves as an agent and does work for you. You say:

@howdy run a checkin

Howdy then asks all of your team members their status, and collates and response for both you and everyone else to see.

Howdy is one of the better implemented category of virtual assistants / agents, and similar concepts exist across multiple chat systems, including traditional email where one can include an AI on the CC line and ask them to do things like schedule meetings and coordinate activities. However, even the best human-augmented virtual assistants can end up generating more email / message traffic for the non-originating party, and thus must walk a fine line between providing service and annoying people.

2. General intelligence — the universal AI

The holy grail of AI, general intelligence bots also take context into account (due to their complexity and computational requirements, GI bots are only fielded by large companies with deep pockets and a whole lot of data about you, so they almost always have some contextual information). Siri, Cortana, Google Now, and Amazon’s Echo fall, to one degree or another, into this category.

Many wouldn’t even consider these conversational interfaces, but they are; search can even be thought of as one. The only reason they’re typically not mentioned in the same frame is that they’re outside of the chat / conversational eco-system where people are already communicating, but Microsoft, Google, and others are bringing then into those realms. In many cases, these technologies are incredibly useful, but typically the more conversational they’re made, the less helpful they become.

Where can I get a burrito now?
I’m sorry, I didn’t quite catch that. Can you repeat what you said?
Siri, where can I get a burrito now?
Okay. Here‘s what I found: [many of which don’t serve burritos, all of which are closed]

To compensate for the lack of true GI, particularly for smaller companies with fewer resources, much of what will be presented as general AI will really be mechanical turks, or more accurately, human / AI hybrids. Many of the intelligent virtual assistants already do this, and openly. In almost all cases the human element is intended to be a stop gap, used to supplement the AI while also providing a corpus of training data to continually improve it, and gradually automate the humans away, but how well this works depends greatly on the specificity of the task.

This is not to say that GI won’t evolve rapidly over the coming years and provide ever greater levels of functionality and accuracy, just that understanding nuance and implication vs. statement is far more complex than is often assumed, and contextual suggestions will likely continue to be more helpful than direct chat for some time*.

3. Augmentation — we do what you tell us

Augmentative interfaces are agents or predetermined routines, and typically don’t involve the veneer of a persona. They monitor your existing conversation, and only do stuff when you use a special phrase or command. They typically have a great deal of context to draw from and clear intent, so they can appear smart while being relatively unintelligent. The @ tag in Slack or Medium is an example.

@mblinder you should read this

and that person is notified without my having to specifically go and send them a message. Similarly, typing:

/emmerge-task demo task creation via slash command

will create a task in Emmerge, and provide visibility into that task to everyone in the Slack channel. Both of these take a user’s intent from within the existing conversation stream, use context to make the actions smarter, and then perform a number of programmatic actions behind the scenes to accomplish want the user wants.


It’s an app ecosystem, but in an app ecosystem

appception

Far from being the death of apps, conversational interfaces, for most intents and purposes, are apps within apps. They are in many ways another death-knell for the open web (almost all chat traffic in the world is, or will soon be, controlled by five or six companies), but that sense are nearly indistinguishable from iTunes or Google Play. By shifting the gatekeeper, those who own the conversation can add shopping and more shopping and more shopping (thus fulfilling Feldman’s law), which like current app stores, they’ll tax, and thus everyone else will go from paying one platform kingpin to paying another.

AI doesn’t have to be all that I

Who we pay aside, the common theme in the good examples above is the minimal use of true AI. It’s dumb AI that acts smart, using a lot of contextual data and clear intent to augment behavior and provide relevant information. Machine learning can be added on top of that context, but the machine learning itself need not be overly complex.

That’s the difference between Clippy, which used the minimal context of your opening a program, and interrupted you, and Google Now, which looks at your calendar to see where you’re going, knows from your phone that you’re in the car, uses that information to determine you’re driving, and suggests the best route for you (eventually learning that you prefer backroads to highways). Both are relatively simple rules (open document vs. person in vehicle and upcoming appointment on calendar in different location than person) which attempt to understand the intent of your actions, but the chance Google Now gets your intention correct is far higher due to much deeper contextual information, and in AI, accuracy and simplicity is the difference between smart AI that appears dumb and dumb AI that appears smart.

Long live our minimally intelligent robot overlords.

*Much of this post is a reaction to the current hype over AI. In the longer-term, we’ll continue to see dramatic growth in the power and breadth of artificial intelligence. Gates wrote in The Road Ahead, “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.”, which is certainly applicable to AI, and as pointed out eloquently in The Second Machine Age, the exponential expansion of computational power will have dramatic effects on the ability to machines to meet and eventually exceed human intelligence. But as The Terminator pointed out, once that happens they’ll probably just nuke us all.