The Conversational User Interface is a Minefield
The path to an acceptable Conversational User Interface is heavily mined and booby-trapped. Let’s not fall into the IVR trap, but create a humanized and personalized user experience. [IVR — Interactive voice response is a technology that allows a computer to interact with humans through the use of voice and DTMF tones input via keypad. — Think about calling your insurance company … ‘To file a claim, press 2’]
While they can benefit greatly from each other, there is no need to create a dependency between a Conversational User Interface (CUI) and Machine Learning (ML). It is not hard to imagine how a CUI can be put to good use with a currently existing service infrastructure. In fact, I think it is a good idea to apply the principal of “separation of concerns” and not merge the already difficult task of creating a CUI with new or untested ideas for services and solutions.
Accessing and interacting with a complicated system in a simple and imprecise way has the potential of creating an emotional connection. Even more so, if the conversational user interface responds empathetically, showing the ability to understand and share the feelings that a user might have, while interacting with such a system. However, a bunch of cruddy IVR bots is the shortest path to nuking this nascent opportunity. The community will have to rally to identify those use cases that provide customer benefit and delight, and truly work in a conversational UI. At the same time, we better show restraint, to not fall into the trap of re-creating the much hated IVR experience, for instance by reaching for the lowest hanging fruits, implementing solutions for use cases that are unfavorable for CUIs.
We are strong
No one can tell us we’re wrong
Searching our hearts for so long
Both of us knowing
Love is a battlefield
Love is a battlefield, written by Neil Giraldo and Patricia Benatar • Copyright © Chrysalis Music Group Inc.
With Virtual Reality User Interfaces, we leave the two dimensional communication (mouse pad) behind. At the same time, traditional UI widgets, like checkboxes and radio-button-lists, find their way into ChatBot experiences, creating a simplified, one dimensional user experience that is more sequential and resistant to context switching.
The Voice User Interface on the other hand, doesn’t provide anything that would allow for the reuse of the old models and metaphors, no buttons, sliders, or switches to click or touch; here we very directly need to say what we mean or want.
From Text to Graphical to Conversational and Voice User Interfaces
The last user interface revolution led us from the text to the graphical user interface and was enabled by new hardware (high-resolution graphic displays, computer mice) and a new software paradigm (object oriented programming). This time, the breakthroughs are noise cancelling array microphones, the streaming of voice sound into the recognizer, while the user is still speaking, and a declarative approach to natural language understanding (NLU). Natural language understanding has recently been solved nicely by several companies (incl. Facebook, Amazon, Microsoft) interestingly in the very same declarative way (and here it is again, using the separation of concerns), but only for the very task of identifying user intent, which seems to work well, when dealing with structured data inside of small domains.
Synthesized speech is still distinguished easily from that of a human speaker. Still, speech synthesis has become far more than good enough, i.e., does not annoy listeners, when it is used for just a few short paragraphs. Moreover, the next leap forward is very much in reach, where we can synthesize text and also convey an emotion or sentiment.
Attributes of good Voice and Conversational User Interfaces
A good CUI maintains context, for instance by carrying-over entities from one request to the next. It is adaptable, of instance by providing less guidance to the user, who has very recently mastered a similar intent successfully.
A good CUI shares control of the conversation, for instance by not only answering questions, but also making reasonable suggestions of its own. It is personal and personable, for instance by addressing the user by his name and using language that if run through sentiment analysis, would show that is matched the emotional state of the user.
A good CUI shows appropriate personalized empathy, by knowing how the information it just presented to the user, made him feel. It acts randomly, surprises, and has (almost) complete coverage of its domain.
By performing tasks beyond answering questions, it encourages frequent (daily) usage (talking to a bot is unlike riding a bicycle, i.e., is disremembered quickly).
Attributes of good Voice and Conversational Interface Use Cases
A voice or conversational user interface doesn’t provide the visual cues, users became to rely on, when using GUIs. Infrequent and/or rare use of a CUI won’t lead to efficient and productive interactions and instead users will have to re-learn (feeling out boundaries, establishing trust, ..) how to best communicate. Therefore, use cases that are performed frequently, ideally daily, are favorable candidates.
For multiple reasons (including the number of entities, complexity, response time, error rate, etc.), favorable use-cases require little verbal input. At least for the near term, a possible rule of thumb could be:
number of words out > number of words in
Favorable use cases do more than answering questions or volunteering insights and advice, instead performing tangible tasks for the user, like augmenting, sending, ordering, etc., thereby creating a sense of accomplishment.
Favorable use cases for a CUI are optimized in some specific way. They can be performed faster, or more conveniently, or with less effort, or more directly, or performed simultaneously with other tasks, when compared to a more “traditional” execution.
I hope to have identified some of the attributes that a friendly CUI should have and also provided some ideas for how to identify use cases that are favorable to be implemented in a CUI.
According to Gartner, “Natural Language Q&A” is already in the phase of disillusionment and will be hitting bottom soon. It is up to us now, to make it out of there.