Why we don’t believe in natural language input (yet)
One of the conscious decisions the Accenture Labs team made in building the Find My Fit proof of concept for healthcare was to ditch text fields (à la Facebook Messenger and many of the other bot platforms available today).
Why use a conversational user interface without free text input? We call it “the menu problem”.
There are plenty of advantages to conversational interfaces that have been discussed elsewhere, but nothing is more terrifying than a blank page. A user, when faced with an empty textbox, has no prompt, no idea what to type to make the system speak to them. This leads to awkward interactions like this one from TechCrunch, on the left:
The TechCrunch bot shown has no idea what is meant by a simple greeting, instead offering articles about “hi”, which is definitely not what I meant.
Quartz, which we’ve made no secret of being inspired by, handles this a different way, giving a limited range of witticisms and emojis to progress through articles in a natural way.
TechCrunch requires the statement “Menu” to show a list of options. This is not an intuitive interaction. Find My Fit, by contrast, leverages graphical user interface components that supplement the conversational flow. An option, when selected by a customer, “interrupts” the conversation and starts a new one, corresponding to the menu option.
Just because Natural Language Processing (NLP) is tantalisingly close to “good enough”, it does not mean it is time to throw away 30 years of research and practice on the graphical user interface.
Where is NLP?
I mentioned NLP, so let’s tackle the elephant in the room. At this point, we are closer to computers understanding natural language than ever before. Artificial Intelligence-powered parsers like Google’s SyntaxNet are the propane to the engines of Twitter pundits and tech journalists, espousing that Artificial Intelligence will change the world tomorrow.
In fact, this is a realm not perfectly understood by developers and academics who spend their days waist deep in this stuff, let alone marketing departments, public relations representatives and those with loud voices on Twitter and Medium (too real?).
The English language is complex, and that’s putting aside all the other ones. Parsing only gets us so far — someone then has to program or train a system to interpret these parse trees and determine meaning (increasingly this is being referred to as “intent” by the platforms in question).
Facebook DeepText goes a step further by taking text and directly mapping it to intents through neural networks, but this requires a lot of training data. Neural networks are built on imprecision and probability. At present they can only be successfully trained for a relatively small number of intents without risking noise and overfitting that can ruin the user experience.
In brief: Facebook, one of the world’s leading technology companies, has created a neural network to determine if someone needs an Uber with lackluster real-world precision, and that propagates through a chain of understandably excited pundits who rapturously declare: chatbots are here and user interfaces are dead.
The reports of the demise of GUIs are greatly exaggerated
Back to our demo for a second.
Given the above, we believe that at present the most powerful way to leverage the power of conversational interfaces is to embrace graphical user interfaces as part of a conversation. If a chatbot requests a location, I can specify it with an address and a map instead of clumsily specifying directions in text (let’s face it, human-to-human we still have trouble with this).
Bots like TechCrunch’s are hampered by Facebook Messenger’s API. Hopefully Facebook and the other bot platforms get better at this.
Conversational interfaces are still great
Without free-text input, there are still a tonne of advantages to the conversational interface paradigm:
- Personalisation: Text can be human and personalised. Variable substitutions are simple and conversation trees, while eventually unwieldy, are at least intuitive to design. Meta tree tools can be used to simplify and abstract conversation trees and promote re-use. We did a lot of work around this for Find My Fit to minimise the amount of new conversation we had to create.
- History: Messaging conversations never lose context. Everything that has been said is there. There’s no wondering how to get back to a screen: the inputs remain inline. In addition to the infinitely scrollable, persistent conversation we added a history tab to Find My Fit that logged important interactions. This augmented the contextual nature of the conversation by providing fast access to appointment dates, check-up results and prescriptions.
- Human-to-human: An increasingly common trend is bots that are driven partially by chatbots and partially by people. In the medical field the value of this is the ability to transfer from a chatbot to a human expert without breaking out of a cohesive experience, with no modality and complete history. It goes without saying that for human-to-human interactions we would break out the free-text fields that we hide for the chatbot. In Financial Services I’m waiting for American Express to do this for travel concierges. I might actually use them if they did.
- Personal Assistants: I make no secret of the fact that I am a fan of artificially intelligent Personal Assistants. Although they suffer from many of the drawbacks of free-text (or speech) input, I am excited for the day when Siri will be useful for something more productive.
- Lightweight Interactions: As the chatbot platforms improve we will have a powerful substitute for mobile applications, without the overhead of home-screen icons, splash screens and poorly designed user interfaces. I’ve written more extensively about this benefit before.
The team at Accenture Labs is currently working hard on our visions for the next few years, and defining our research agenda going forward. At this point we’re confident that chatbots, conversational interfaces and personal assistants will play a part in that agenda and we expect to have a lot more to say as we learn more.
As usual we welcome feedback and would love to hear what you think.