We need to talk about Conversational Interfaces

Taxonomical matrix of CUI

Conversational User Interfaces. The term seems so new, yet familiar in nature. From the frustratingly hard-of-hearing automated support hotline repeating ”Sorry. I did not get that. Your options are…” for the nth time, to the snazzy new chat-bots making star appearances on your favorite social medium.

Yet, if you were asked to make a short, 5-minute presentation on any such solution, how exactly would you describe it? Could you?

We’re really excited about Conversational User Interfaces (quite a mouthful — so let’s call them CUI’s from now on), both on and off our work at a digital agency in Denmark. But we too have been stumped when trying to paint a picture of a CUI solution to a prospective customer — or each other.

So we went looking. We went looking for ways to approach and describe CUIs — anything from whitepapers to attractive infographics. But we found none (no bot available for that errand…) So we thought, we talked and we thought some more, and we decided to invent a classification and taxonomy of CUI’s. Simply put, we’ve identified 4 axes of differentiation between the different solution types, and so far, it’s enabled us to both talk and think about CUI’s more efficiently and with less boiler-plate explanation in the process. Reducing long-winded explanations of exactly what you mean to a simple word means other people — and yourself — won’t get lost in your train of thought.

Once we’d gotten our talented designers going, making an awesome single-page graphic, we discovered that, in fact, we had so much more to say on the subject, so we decided to elaborate and express our thoughts and perspectives in this document. For the TL;DR, check out our infographic. We promise, it’s great.

A taxonomical matrix of CUI, DIS/PLAY

The four axes

Our taxonomical matrix has four dimensions or axes. First off, the cognitive complexity of the solution. This describes to what degree natural language processing and comprehension is incorporated into the CUI — and, consequently, how autonomously the CUI is able to reason and generate responses based on user input.

Second, the way interaction happens with the CUI. CUIs encompass both text, speech and physical, gestural action, as well as pre-determined or arbitrary input. As with any UI, it’s important to be able to describe exactly how you interact with the logic underneath; not less so with CUIs.

Third, the hardware medium, the CUI is designed to be deployed on. At the beginning of the current millennium, the thought of, in 17 years, having a computer strapped around your wrist that outperformed all contemporary high-performance computers seemed crazy. Now it doesn’t. Android Wear and the Apple Watch are both orders of magnitude faster than computers on January 1, 2000, and they are infinitely smaller, while lending themselves to completely different ways of interaction than do desktop computers — or smart, connected coffee makers. Yet, CUIs span the entire gamut, from tiny to high-powered to appliances.

Finally, the use case. This is, really, the raison d’être and underlying reason behind every CUI application. Do you want to automate human processes, or do you want to “sweeten” a long, boring process that’d require a lot of buttons, text fields — and would drive prospective users away no matter how great the light at the end of the funnel was.

With this in mind, let’s start by taking a look at the types of cognitive sophistication available in CUIs.

Cognitive: Script
These are based on completely predetermined user inputs and responses. These are your 1980’s text adventure computer games — or a simple troubleshooter. Pre-determined events trigger the transition through the CUI’s different states, each state having its own input/response options.

Cognitive: Script + synonyms
Making the simple script-based user interfaces more appropriate for user interaction, these essentially bolt on a thesaurus to the script engine. That way, simple single-word substitutions can be understood anyway — for instance, both “Goodbye” and “Farewell” would probably cause the CUI to terminate.

Cognitive: Script + semantics
This is the current best-practice level of cognitive capability; instead of merely allowing for single-word substitutions, your Script+Semantics CUI actually tries understanding the intent behind the input. For instance, “Have a good one” and “bye” are extremely dissimilar from the word meaning-and-grammar perspective, but from the semantic point of view, they both mean exactly the same thing. This is an active area of research, and immense amounts of computational power is being applied to this area.

Cognitive: Intelligence and Autonomy
This is the fully-evolved William Gibson-style artificial intelligence. Not only can it transparently understand what you write, it can also acquire new knowledge by absorption in its interactions. These are — not surprisingly — not exactly on the market, and there’s still a debate going on whether we will ever give electronic birth to an autonomous intelligence — as well as whether we should.

The way the user interacts with the CUI also has four divisions.

Interaction: Pre-Defined Options
Check boxes or dialog buttons. Everything’s predefined. It’s not really a conversation — but it may give the illusion of one. It’s easy for the user to browse and select his/her options at a given point during the conversation, but the effective flexibility is nil.

Interaction: Free Text
A text field and a button. Enter your thoughts — and get input back in text form, too. The most common interaction form as experienced by tens of thousands of Facebook chat bot users every day. It’s effective, private — but may also be a hassle if large amounts of text entry is required.

Interaction: Voice
Amazon Echo. Amazon show (Echo with a screen). Apple Siri. Increasingly, the best-of-breed commercial conversational user interfaces use the human voice as both input and response. Both speech recognition and voice synthesis have reached a technological sophistication that they are definitely ready for the end-user. With voice interaction, you’re definitely going to base yourself on another company’s product — it’s not feasible for most organizations to develop their own speech recognition or synthesis, but thankfully, it is possible to put the industry leaders’ effort to good use. Privacy is an obvious issue, so expect to turn some users off because of this.

Interaction: Gestural / Facial
Humans express themselves using body language, and conversational user interfaces can use this aspect, too. For instance, imagine a support system that adapted its responses to the frustration level of the customer, based on his or her facial expressions. With off-the-shelf gestural cameras such as the Kinect, and high-resolution machine learning-based solutions for sussing out the complexities of the human face, it’s definitely possible to use these interaction types.

As opposed to the interaction type, the medium is the physical device the interface is mediated by.

Medium: Wearable
Siri on your Apple Watch. The conversational interface is not farther away than lifting your wrist to your face and saying “Hey, Siri”. Useful for single-user interactions of a simple nature. The wearable devices typically do not have enough computational power to really leverage complexity, so consider using cloud-based solutions for this medium.

Medium: Device & Desktop
Encompassing everything from smart phones to large desktop computers, this medium is where most people are at. This medium, coincidentally, is also the one where the end user can handle the largest amount of complexity, and where the user/device interaction time can be the longest.

Medium: Appliance (IoT)
Here, your conversational user interface solution must be function-oriented. Listening to a washer-dryer read poems might be quirky-fun, but it gets old really quickly. The user will be focused on setting the appliance’s parameters, as well as surveilling the current status of the appliance when it’s running, and suspending the appliance when it’s done. Your CUI should encompass this.

Medium: Robot (industrial or anthropomorphic)
Essentially, a mix of all above categories, completely dependent on the robot’s purpose. Appliance-type robots should behave like appliances, whereas anthropomorphic robots such as the Softbank Pepper should be more approachable, conversational and eloquent.

Finally, the use case of the conversational user interface is, essentially, what you seek to replace, supplement or create with the CUI.

Use case: Human substitute
Whether you want to reduce the load on your support call center, handle phone system routing without phone receptionists, your ultimate goal is replacing human activity. This, generally, is where CUI’s perform the worst. Our core mantra for conversational interfaces is supplement — don’t supplant. Your users will be disappointed for political or user experience reasons.

Use case: Main Feature
This is your news aggregation and filtering phone apps, for instance. The conversational interface itself — not the functionality it is the shell of — is the star of the show — and may also be a bit of a gimmick to generate buzz.

Use case: Conversation-as-a-service (CAAS)
The flip side of making the conversation itself the primary feature, the conversation-as-a-service is the CUI applied to solid functionality. For instance, an automated, conversational support system, or an on-boarding flow in an app, carefully explaining each step and feature to the user.

Use case: Tough to Easy (human enhancement)
Instead of filling out a 35-page tax return form in a PDF file, why not have a chat with a conversational interface which extracts the relevant information and fills it in? That’ the core idea of the Tough to Easy use case. Reducing bother and complexity for an end user is a gain both for end user and your organization, too.

Every conversational user interface can be analyzed by this taxonomy; sometimes, a product may not fit neatly in a single category, but it’ll usually fit two, indicative of a product spanning several user interface types.

For instance, Apple’s Siri can be described as being:

Cognitive: Script + semantics. It can recognize, to some degree, the meaning and intent of your input.

Interaction: Voice. For obvious reasons.

Medium: Wearable, and Device/Desktop. It’s available on both Watch, iPhone/iPad and Mac.

Use case: Main Feature, and Conversation-as-a-service. Siri is a bit of a gimmick — but it’s integrations with services like Wolfram Alpha make it a virtual go-getter of interesting data you wouldn’t have found otherwise.

Drop a line … jhv@dis-play.dk


By David Christensen & Jacob Hvam of DIS/PLAY A/S