Conversational User Interfaces

Published in

Mission.org

11 min readApr 6, 2016

The ubiquity of conversational experiences remind us how easy it has become to interact through computers. However, if you’re younger than 30 it might surprise you to learn that many computers used to look like a scene from the movie Hackers:

The user interface that you’re familiar with today is called a Graphical User Interface (GUI) and was popularized by Xerox, Apple and Microsoft during the 1980s. Early computers were exclusively text-based, which meant that your primary interaction with a machine was a cryptic, highly-coded experience that involved typing basic commands onto a screen like the one pictured above.

GUI helped computing become more intuitive, friendly, and accessible for the average person, and it’s presently the standard across all of our personal devices.

In a broad sense, GUIs have evolved to become increasingly user-friendly and visually appealing, but they are not without their shortcomings. If you’ve ever had the displeasure of booking a flight or restaurant reservation online, you’ve experienced a myriad of inconsistently designed text boxes, drop-downs and poorly placed buttons.

Combine this with our mass odyssey to mobile devices, which lack the real estate to showcase a page’s necessary elements, and we‘re left with a 20 click, 8 screen experience that makes us long for the days of land lines and Yellow Pages. A poorly designed website or mobile application can transform a 2 minute dinner reservation into 20 minute ordeal.

Now consider a possible alternative:

How nice would it be to type “Reservation for five people @ Momofuku 7pm” into your phone?

This is the power of an emerging development known as the Conversational User Interface (CUI), and it’s helped us realize that sometimes it can be easier to have a conversation with a computer than it is to tap, swipe and bungle our way through a poorly executed user experience.

If you live in the Eastern Hemisphere, you’ve probably already experienced this. The popular texting app WeChat is so heavily intertwined with Chinese businesses that the apps we’re accustomed to downloading in North America exist organically within the messenger itself. In fact, many businesses that would otherwise have native apps or mobile sites bypass them completely and opt for WeChat accounts. Chances are, if you start a new business in China, it’s based out of a chat app, which means your customers can converse with you directly in real time.

Here’s a translated mock-up:

Messaging platforms are beginning to take advantage of a wide variety of media, pushing the boundaries of the interface. It’s becoming a space where tiny applications can be built and incorporated to adopt a host of useful features, including media players, games, in-message payments, and photography.

Companies like Operator are leading the North American space, providing a well-rounded experience that relies on text but supplements conversation with visual media or basic commands.

CUI’s most attractive feature is front and centre: it’s conversational. And although most of the CUI conversations we’re presently having are person to person, artificial intelligence will slowly replace us as it develops. Ideally, we’ll be able to have engaging discussions about our utmost curiosities instead of just the products we’re interested in. Siri is a good start, but as it stands we’re still talking at her, not with her.

Until our technology is able to pick up on the subtle aspects of communication — including facial expression, tone of voice, and body language, text is the perfect vehicle for AI. It may not be as emotionally alluring as the conversations Joaquin Phoenix had with his OS in the film Her, but it’s a baby step in that direction.

You’re Reading Design for Humanity

Design for Humanity is an interactive essay exploring the past, present, and future of anthropomorphic design. You’re currently reading part 3 of 7.

Artificial Intelligence

Alan Turing knew early on that machines would have difficulty picking up and replicating the subtle human gestures inherent in conversation. For this reason he gave machines some leeway in his initial version of the Imitation Game: an experiment devised to determine whether a machine could imitate a human.

Now referred to as the Turing Test, The Imitation Game originally involved three players: Player A (computer), Player B (human) and Player C (human). Through a series of written questions and answers, Player C must determine which player is a human, and which is a computer.

Turing believed that if Player C is fooled by the computer 70% of the time, then we can argue that the computer is intelligent.

Of course, when the test was developed in 1950, computers were decades away from having sophisticated vocal capabilities, so early tests were (and still are) text-based. The test itself has been toyed with and altered several times over the last 65 years, but the same question pulses at its core: can a computer convince us, through conversation, that it is human?

One of the earliest attempts was Eliza, designed by MIT professor Joseph Weizenbaum, in 1966. Cleverly named after George Bernard Shaw’s Eliza Doolittle, users would chat to the program through an electric typewriter.

Weizenbaum based the program on the techniques of Rogerian Psychotherapy because its empathetic therapists tend to ask passive, leading questions that don’t require much dialogue or knowledge on the topic being discussed. After the user types a sentence, Eliza scans for keywords like “mother” or “headache” or “family” and responds with an appropriate sentence from a large database. If it cannot identify a keyword, it will simply generate a generic reply to keep the conversation afloat.

Eliza’s shallow conversational depth nearly always exposes itself after a few moments, but its easy-going, passive nature charmed the public upon its introduction. In fact, some of Weizenbaum’s colleagues and students exhibited very strong emotional connections to the program.

“My secretary, who had watched me work on the program for many months and therefore surely knew it to be merely a computer program, started conversing with it. After only a few interchanges with it, she asked me to leave the room.”

As the preceding passage suggests, Weizenbaum discovered something disturbing: people approached the bot as if it were a real, intelligent entity who was genuinely interested in their condition — even when they knew that it was a computer program. This condition is now referred to as the “Eliza Effect”, and although the behaviour of its sufferers isn’t always as extreme as the case above (alone time with robots is not consistently requested), it propelled Weizenbaum to become a staunch critic of AI.

The Eliza Effect is a prime example of how an object, in this case a typewriter, can form a visceral connection with just a few rudimentary human elements. All it took to elevate a keyboard and screen to a human level was a basic conversational program.

An object can form a visceral connection with just a few rudimentary human elements.

Although Eliza had a powerful effect on the people who interacted with it, there was something missing: namely, an emotional and aesthetic intelligence. If we rewind to the conception of Turing’s Imitation Game, we learn that it was an offshoot of a party game where a man had to convince another player that he was a woman.

At the core of this game lies empathy. Of course, the best way to convince another player that you are of the opposite sex is to put yourself into their shoes, to empathize with their experience and thus imitate them as best you can.

While present day versions of Eliza have vastly improved — they’re able to maintain a conversation for several minutes before exposing their limitations — they are still running on advanced algorithms that don’t seem to “understand” the content that they’re receiving or delivering, a fact that is best explained by the Chinese Room Experiment. This theory posits that a man, who doesn’t speak Chinese, could be placed in an isolated room and and consult a guide to formulate answers to questions fed through a slot. Despite his ability to engage in a technically sound exchange, he would still have no understanding of the conversation.

And so, it appears that one of the final hurdles for AI is not, as Turing suggested, the ability to fool its human counterparts, it’s the ability to comprehend the information it is receiving and eventually empathize with others. If we take a look at the video below, we can see that the featured AI can engage in a conversation, but its final answer suggests that it has little understanding of the conversation it is having:

Humanizing Technology

While there is an entire branch of computer science dedicated to this field, brands are leveraging present day developments and incorporating them into practical, every day products.

Amy is an application whose job is simple enough: she helps you schedule and organize your daily meetings. Her most charming attribute is how noticeably human she is. To activate Amy, you simply cc her in your email thread, as you would a human office assistant, and she will spring into action. Amy has access to your calendar, so she knows exactly when you’re available. She knows all of your favourite meeting locations, because any time you discover a new spot, you can send her a quick email. But Amy’s most human feature is her ability to take care of your remaining correspondence if you’d like her to.

As you can see, Amy is more than a chatterbot. She is able to retain information about previous meetings and make suggestions based on that knowledge. She is independent to some degree. She can make “conclusions” that only humans could make a few years ago. As conversational interfaces progress, and begin to adapt to our habits, these interactions will become even more nuanced. Companies like Google, Facebook and Amazon already know how we behave online and in the marketplace, but our CUI will get to know us like a friend or colleague.

The engine propelling this technological development is a process known as deep learning. At its core, deep learning teaches software to recognize complex patterns in sounds, images, and other data by feeding it large amounts of material and refining how it responds to the input. The most impressive use of this process has been IBM’s Watson, which captured the world crown in Jeopardy! and is now being used to help doctors diagnose and prescribe treatment for their patients.

CUI Appreciates Your Patience…

So far we’ve discussed CUI’s potential to change the face of human-computer interaction, but we’ve yet to touch on its shortcomings, especially in its present day form.

In the case above, the promise of an effortless, engaging conversation with our computer feels more like the automated priority queue that you’re forced to sift through when you call your cable company: a hair-pulling process that always feels sterile and far more time consuming than the 5 seconds it would take to explain your needs to an operator.

In fact, the company above has thrown a very subtle curveball at us: it’s forced us to interact with a conversational interface in the same way we would with a graphical interface. Punching numbers into a phone is a far cry from an actual conversation.

Even Siri, today’s most technically advanced CUI, has its shortcomings. Adept at restaurant recommendations, recipes and directions, Siri has difficulty picking up on the nuances of sensitive subject matter. It famously gave a potential suicide victim directions to a gun shop, and once told a worried mother that her daughter’s sexual abuse issues were “not a problem.”

Text-based communication is clearly in its early stages, and vocal recognition may evolve to read our tone of voice and deal with delicate subject matter, but neither account for body language or facial expression, elements that future interfaces may rely on as heavily as our words. Although CUI has vast potential, it is artificial intelligence that will propel our machines to the next level.

Ideally, as AI progresses, we will interact with our machines in a variety of different ways, be it text, voice, GUI, body temperature, or other subtle gestures dictated by the circumstance and preference of the user. Perhaps our machines will know us so well that they’ll be interacting with us invisibly.

We will interact with our machines in a variety of different ways, be it text, voice, GUI, body temperature, or other subtle gestures.

For example, I would prefer not to pull out my phone, enter a password and text my home to let it know that I’ve arrived. Instead, my future home will sense that I’m near, identify my face as I approach, and unlock a split second before I reach for the handle. Perhaps, if it doesn’t make me feel too self-important, it will open as I approach.

Present day pop culture has picked up on these possibilities as well. Think of the OS featured in the movie Her, which draws split second conclusions about Joaquin Phoenix’s relationship with his mother based on a sigh. Its ability to decipher the implications of a robust relationship from a simple pause in conversation hints that our devices might eventually surpass our ability to pick up on tiny emotional cues.

Computers are slowly learning to amass and interpret data in ways analogous to humans.

It’s still too early to accurately predict exactly where these interfaces will manifest themselves, and how they will look or feel, but research and implementation of emotional data has begun. Through passive sensors that detect information about a user’s physical state and behaviour, computers are slowly learning to amass and interpret data in ways analogous to humans.

For example, video cameras are able to capture our facial expressions, posture and gestures, while microphones and other audio devices capture the subtleties of speech. Meanwhile, medical devices have long been able to monitor physiological data like temperature and heart rate. Again, it’s the AI’s ability to piece this data together and make independent conclusions that will allow us to interact with our technology with more depth.

Design for Humanity

An interactive essay exploring the past, present, and future of anthropomorphic design. Also available as a talk.

1: Design for Humanity

2: Apple, The Original Human

3: You’re here!

4: A Smarter Future

5: Emotional Machines

6: Computers Cry Too

7: The Day You Become a Cyborg

Thanks for Reading

This is an interactive + evolving essay. Please get in touch if you have thoughts regarding new content, modifications to current content, or anything else!

If you enjoyed reading this article, please hit the ♥ button in the footer so that more people can appreciate great design!

Hi, I’m Daniel. I’ve founded a few companies including Piccsy (acq. 2014) and EveryGuyed (acq. 2011). I am currently open to new career and consulting opportunities. Get in touch via email.

This article was co-authored by Shaun Roncken.