The Rise of Conversational Interfaces and the Great Shift in How Brands Must Communicate

Conversational interfaces will unseat the graphical ones and become the dominant way for human-computer interaction. Why? What changes will this bring for brands and designers? What will design look like in the conversational era?

Samuel Stenberg
language+brands/design
37 min readAug 29, 2017

--

If you’ve passed your 25th birthday, you can say you were there when the Aliens landed.

It started happening in the late 80’s. The alien race called Computers descended and began moving into homes all over the world. The buzz about their arrival was deafening, the speculations of what they would mean for our society never ending. The Computers seemed friendly, but at the same time foreign. Powerful, yet hard to tame.

Ever since they arrived, us humans and the computers have been working on tackling one enormous challenge: understanding each other.

You see, the biggest problem in the relationship between us and them was that we didn’t speak the same language. English, Swedish, Japanese, Spanish, Chinese — they would have none of it. Instead, the first computers had us learning different prompts and commands. The giant grey box in our living room was stubborn, “type the right command or I won’t understand you.”

So we began learning. We so desperately wanted to interact with these creatures that we were willing to write commands with / and : and www’s and .com’s. We learned to click and scroll. Our vocabulary expanded with words to describe and understand the computers: URL, IP, LAN, hard drive, RAM.

The interfaces for our communication with the computers have gotten better. Heck, now we can even execute commands without the need for a mouse by simply touching their screens. Still, that progress is just a baby step compared to the revolution that is around the corner. The Aliens are about to learn how to speak our language, and that will change everything.

Welcome to a world of conversational interfaces.

In this story, I’ll go through how and why we at Uppfatta believe brands will need to change how they communicate when the graphical interfaces we’ve known for 25 years start being complemented or replaced by conversational interfaces. I’ll cover:

1. What’s going on — the technology and the trends out there
2. The Change — how conversational interfaces will change brands and user experiences
3. The New Design Field — the skills and methods needed to design conversation
4. Bringing Conversations to Life — how to turn assistants into characters
5. Closing Thoughts

If you like the story, hate it, or feel that it raises questions, feel free to comment or reach out to us on Twitter och Facebook.

Let’s go!

ONE | WHAT’S GOING ON?

WE NOW HAVE THE TECHNOLOGY TO CONVERSE WITH COMPUTERS

If you’ve ever seen a superhero movie and secretly wished for a super power, you’re in luck. Because when you were born a human, you were gifted with the greatest super power of all: language.

For hundreds of thousands of years, humans have used the power of language to plan, issue orders, socialize and establish our dominance as a species on this planet. And for a vast majority of that time we’ve not been able to write stuff down, so we have used the spoken word to carry our thoughts across to others. We’ve used conversations. Talking to others — and seasoning it with gestures, grimaces and such — is our natural way of communicating.

Looking at society today, this is not immediately apparent. Take a walk through your local mall and you’ll probably see more people interacting with their screen than people talking to each other. The screens are everywhere, they are the one thing you can be a hundred percent sure everyone in that mall carries on them, beyond clothes.

Why? Because the screens are portals. Portals into the world run by those aliens. And because the aliens don’t speak our tongue, the screens are needed. We’ve become so dependent on their world, so addicted to it, that those screens are our most precious belongings. If we were to peek into people’s homes, we would surely find that screen time vastly outranges time spent in real-life conversation.

A huge part of our screen time actually consists of conversation.

This is not quite the full story, however. Yes, the live conversation has taken a step back because of screens, but a huge part of that screen time actually consists of conversation. These past years we have seen the rise of messaging apps as the dominant force in the mobile space. A milestone was reached in 2015, when we spent more time in messaging apps (WhatsApp, Snapchat, WeChat etc) than on social media networks (Facebook, Twitter). And the trend seems to continue. According to the Meeker 2017 Internet Trends report, it’s also seeping into B2C-communications, where customer service is rapidly shifting to chat instead of phones and e-mail, as users demand faster response time.

When the aliens arrived, our primary idea was to use them to distribute and receive information. Heck, we named the whole field of study informations technology — IT. The last 5–10 years clearly show us that the name was misguided. Sharing information is nice, sure. But the Internet is, just as the rest of the world, about people.

And people talk.

Chatbots — the first wave of conversational

The platform holders are rising to meet our need for conversation and brands are scrambling to keep up. Today we see constant innovations connected to the messaging experience. This has lead us to the rise of the chatbot. For brands, the chatbot comes as a saviour after a couple of years of head-scratching. The term dark social arose for social stuff we do where brands can’t reach the users, follow and adapt to their behaviour. With chatbots, brands have left the dark. Interest in chatbots exploded beginning early 2016, according to Google Trends, and we see no signs of it abating (yet):

Google Trends data for search term “chatbot”, for august 2012–2017

What is a chatbot? The term generally seems to be used for bots that users can chat with through common chat interfaces such as Facebook Messenger, WhatsApp and Slack. We’re only in the infant stages of this technology, but so far they seem to give the user the ability to use the more basic parts of a service directly through the chat interface, by simply chatting with with it. A couple of examples:

CNN and Wall Street Journal — chat with their bots to get the latest news directly in Messenger, instead of hopping over to their webpages.

KFC and Pizza Hut — order from these fast food giants via a conversation with their bots.

KLM and Icelandair — book via their bots and receive info concerning your flight from them.

Über — get a ride directly from the Messenger interface.

The list could go on but, basically: Chatbots are getting huge. And still these are only the very first uses we are seeing. Microsoft CEO Satya Nadella has named chatbots “the new apps” and using that comparison we are now where apps were around 2008/2009. Which is to say: nowhere near where we are going.

Chatbots certainly are the first wave of conversational interfaces. They speak to our need for simple, direct communication while they also show how service providers and product sellers alike can provide parts of their offer via a messaging interface. But with a chatbot that interface still needs to be graphical. Conversational is going to go beyond that and leave our need for screens behind.

Which is where the revolution truly begins.

The chatbot interface from Domino’s Pizza. Photo: Domino’s.

Where conversational truly gets interesting: the virtual assistants

Looking for new friends? May I suggest befriending:

Cortana?
Siri?
Alexa?
Bixby?

These are the names of the virtual assistants offered by Microsoft, Apple, Amazon and Samsung. Google have the Google Assistant, which is more of a platform for developers to use for their different conversational interfaces. Many more virtual assistants are going to be born with the Google Assistant platform.

All the tech giants are betting heavily on voice-controlled conversational interfaces. Amazon has for quite a while been selling their Amazon Echo device, which is powered by their conversational interface Alexa. And just this summer, Apple announced HomePod, their Echo competitor powered by Siri.

Why are the tech-giants betting on voice and virtual assistants? Well, for a plethora of reasons. Simon Stefanoff, Digital Director at Deloitte, said the following at a lecture in Sydney earlier in 2017 regarding the arguments for voice:

“Firstly, if we can reduce friction on any transaction or a goal that we want users to hit, we know that equals more conversions generally. If you use voice properly, you can lead people through these scenarios and transactions that you want to convert, and I think that can be really powerful.

“Secondly, it’s a natural interface — it’s the natural interface. Voice as input, at least, is very basic — you’re not exerting much cognitive load or mechanical load right now as you listen to me talk, and the same is true for voice.”

Again, the word natural is important. Up until now we have relied on two types of interfaces to make the aliens understand us. Command Line Interfaces:

and the aforementioned GUIs:

The Android GUI — instantly recognisable even for an Apple evangelist like myself.

A conversational interface falls into the category of what at least Microsoft calls a NUI — natural user interface. Here’s a chap called Bill Gates writing about NUIs on his blog:

“Until now, we have always had to adapt to the limits of technology and conform the way we work with computers to a set of arbitrary conventions and procedures. With NUI, computing devices will adapt to our needs and preferences for the first time and humans will begin to use technology in whatever way is most comfortable and natural for us.”

I agree with Stefanoff and Gates on the arguments for natural user interfaces, voice specifically. I believe people severely underestimate the impact it’s going to have when a natural user interface becomes as good or better than a GUI. We’ve become so used to the idea of having to use different graphical interfaces — of having to click, scroll, touch, type on a keyboard — that we’ve forgotten it’s a practice we invented just because we wanted to communicate with the aliens.

What if I could just tell the computer where, when and how I want to travel, instead of navigating through the cluttered interface of a travel site? Why type a search for a bread recipe, then scroll, sort and read, when I can just tell Alexa “find me a popular recipe for breakfast bread and read it back to me”? What if a visual designer could speak to Photoshop and describe exactly the edits she wanted to the picture, instead of having to sit there and adjust it manually? (IBM have in fact already had a marketing manager create a campaign entirely by talking to their supercomputer Watson.) This level of functionality in a conversational interface would make us forget we ever accepted scrolling, typing and touching.

The technology is already great — and getting better

To get to what I’m talking about, technology still has some ways to go. Progress is rapid, however. Accuracy for computer voice recognition is now at 95 % (which is human level recognition), up a whopping 20 % since 2013.

Of course, those 95 % represent accuracy in understanding literal meaning — but there is much more to language. Understanding emotions, inferences, reading between the lines. And in those areas, the aliens still have some learning to do. As long as we’re forced to “talk in commands”, so to speak, we will compare the experience to using a familiar GUI to issue those same commands. And then voice falls a bit short (even if it’s still very convenient). Alexa can talk and be talked to, sure, but Alexa doesn’t feel human when she speaks. It’s still about commands going back and forth, not a true conversation.

But make no mistake: the computers are learning this as well. IBM’s cognitive computer cloud platform Watson has a text tone analysis that constantly gets better. Empath (emotional well-being app) and Cogito (real-time emotional intelligence solution) both offer voice analysis that can interpret our emotional state from our voice. And Microsoft has a technology that can read our emotional state from facial expressions. Google very recently (end of August 2017) open-sourced the speech recognition dataset used for the Google Assistant, meaning it’s development is going to be heavily accelerated.

Innovations like these will strive towards teaching the virtual assistants both to understand us and talk back in a more human way, since all the tech giants know that in order to make conversation with these assistants truly appealing, they need to feel real.

AR & VR will accelerate the need for conversational

It’s also worth mentioning augmented reality (AR) and virtual reality (VR), two technologies which are getting a lot of buzz (even though Pokémon Go still sort of remains the only mainstream example of how one of these (AR) will be attractive to users). But while we still lack great examples of widespread AR/VR usage (because the technology is still too expensive), there seems to be no doubt that we are headed toward a future where these technologies play a large role. And both of them underscore the need for conversational interfaces.

Because in AR and VR, typing isn’t attractive. Pointing, sure, but typing? AR and VR will be about talking, otherwise it defeats the purpose. How real is a virtual “reality” if you can’t talk to the people you meet? When those people are actually real people, your friends hanging out in the same virtual Facebook lounge or whatever, the interface is no problem. You just talk into a mic and they can hear you and talk back.

VR makes for some immersive experiences. Photo: Hammer & Tusk on Unsplash.

When you’re in VR not interacting with real people but with computer characters, either in a game or in the virtual representation of a service, we’re talking about a conversational interface.

Let’s imagine the VR version of an H&M store. Instead of sitting down in front of their flat, boring web page, you can enter their VR store and walk around, browse clothes in 3D, even actually try them on by having a digital version of a dress or jacket appear on your virtual avatar. If you want some kind of help in that virtual experience, how would you like to receive it? By having a box of text appear next to the clothes for you to read ? Or, by simply asking a virtual H&M employee (an AI, not an actual person, silly) the thing you want to know?

Yeah, VR and AR is going to be all about conversations.

Finally, we understand each other

What’s interesting is that even if the technology and the software coupled with it is in its infancy, our behaviour is already starting to shift towards spoken conversations. The Meeker 2017 Internet Trends report showed that 20 percent of mobile searches were made with voice in 2016. And the same report also admitted that Amazon Echo is “exploding in popularity” — what do you think is going to happen to the market now that Apple enters it?

So the shift toward conversational is happening, right now. Machine learning and natural language processing is exponentially speeding up the learning process of the aliens (some Facebook chatbots have already been observed talking between themselves in a language they invented). And the AI field is booming; every day new companies enter the market to teach computers to do stuff previously reserved for humans. The more computers learn to do, the more we will have to work on our way of communicating with them. On top of that, increased adoption of technologies like AR and VR will only further the need for conversational.

Machine learning and natural language processing is exponentially speeding up the learning process of the aliens.

We don’t know exactly how it will all play out, but this is the general direction. The web changed everything for brands, the smartphone did it again, and while this story is just exploring what impact conversational might have, hopefully you’ve begun to realize that it’s going to be a big one no matter which exact form it takes.

So we can start counting down towards the day when the aliens finally will have learned to communicate via human-level conversation. Conversations will be the computer interface of the future — just as conversations have always been the primary interface of human-to-human communication.

What happens then? Well, I work with several different brands on a daily basis, and they are using graphical interfaces in all shapes and forms in order to reach their audience or deliver their service. So I can’t see anything other than a seismic shift coming our way when it comes to how brands will design communication.

Let’s dive into the implications.

Photo by Tycho Atsma on Unsplash

TWO || THE IMPACT

HOW CONVERSATIONAL INTERFACES WILL CHANGE EXPERIENCES & BRANDS

I believe brands will be impacted by conversational not only in the usual “you need to consider how to use this technology”–way. That’s way too narrow-minded. It isn’t just about how brands can use conversational. It’s about how user behaviour is going to change, how expectations are going to change and how entire organizations will need to change in order to meet those expectations.

To begin exploring the impact, we must look at how we as consumers typically interact with brands today.

Oftentimes interaction occurs via one of those screens, as with Spotify or Facebook. If it’s a product we’re consuming, such as a Nike shoe, then we still interact with the brand in many digital ways even though the product is physical. So almost every brand today is all or at least in part about a digital service. In these services, function is important but probably equally important if not more is the, for lack of a better word, “feel”. You can list bullet points of what your app can do, but if it doesn’t “feel” good to use, don’t bother me with the bullet points at all.

What is that “feel” made up of? A complex question of course, but the easy answer is to say that the structure and features of the experience; the colors, shapes and forms of it; the type and tone of the content (both read and viewed) — all contribute to a user experience that hopefully is characteristic for the brand and the service. Somehow when you’re on Spotify’s web page you feel that you are interacting with the same Spotify that you interact with when you’re using the mobile app. And it feels good. That’s thanks to all of the above.

Structure. Color. Shape. Form. Read and viewed content. It’s all about seeing, isn’t it? So what happens to the brand experience if we’re not talking about a graphical user interface, but a conversational one? A lot. When I’m instead simply talking to your service, it suddenly won’t matter nearly as much how clean, user-friendly or eye-pleasing your GUI is, because I might never even see it.

The Apple Homepod. Photo: Apple.

Booking with AirBnB becomes a chat with Columbus

Let’s think about planning a weekend trip to Prague, using AirBnB to find a place to stay. In a couple of years, AirBnB will let me do this simply by talking to their virtual assistant. For the purposes of this story, let’s call him Columbus (you know they will name him something related to journeys and discovery, so why not name him after the discovery).

When it’s time to start planning, I won’t even need a screen. I’ll instead go into the kitchen, start preparing dinner and say “Hey Siri, go to AirBnB” to my HomePod. I chop onions and cry, while Columbus lists my options. He’ll probably say:

“Samuel, I found 92 available places, what’s your budget?”

From there we’ll narrow it down. I’m cooking, chatting with him (“Does it have a balcony?” “No” “Okay then remove it from my list”), and then when me and my partner sit down to eat we take a look at the Ipad (“Columbus, send it to my Ipad screen”) to see which of the six remaining options look best. (We just tell Columbus to show us pictures from the different places; no need to touch the screen with our potentially greasy fingers!) And when it’s time to book, we simply tell Columbus to do it for us.

NUI > GUI

In the above example, I’ve just booked a place to stay almost without using the GUI of AirBnB (I’ll probably look at the screen to confirm the details or something, at least my first time through to see that he actually got it right). My primary mode of interaction was through the conversational interface of Columbus.

In this future, Columbus obviously becomes a hugely important part of the AirBnB service. The single most important part, probably. For times when I can’t talk, the GUI will still be there. Maybe I’ll swipe through my options on my commute to town, only to get back to the conversation when I’m free to speak. And let’s be honest. Today you can still send a physical letter to the Department of X with your information, but with the same service available online, why would you? We’ll see the same situation with GUI:s becoming unattractive as we simply want to talk to our services. Or at the very least we’ll combine talking and navigating a GUI.

I see the following argument here and there: “But we won’t use voice when we’re around other people; talking to a computer in public is stupid.”

It doesn’t hold up for two main reasons. First: People won’t notice you’re talking to a computer; you’ll appear as if you’re on the phone since language will be so natural. Second: Everyone around the coffee table sitting and staring down into a small screen is stupid. But that happens right? Tech can alter our behaviour and expectations of what’s normal and not.

The smartphone. Our window to the world. Photo by Annie Spratt on Unsplash.

Adjust your offering for conversational

At the top of part 2 I wrote that conversational brings about fundamental changes to organizations. This entire story is dedicated to how teams will need to evolve their skills and how they look at design, but let’s also not forget that your offering in and of itself may very well have to change. You will need to start thinking what part of your service you might transform into conversational, or what you might add to it to make it conversational. Are there simple tasks in your service that you could offload to a chatbot on Facebook Messenger or Slack and let the user perform by messaging with the bot? (Akin to ordering a pizza with Domino’s or hailing an Uber ride.) Is there stuff you could add to your service that would fit conversational? Can you start experimenting on your website today with a more conversational structure, to begin learning and gathering data for the shift to come?

(In developing your offering you’ll hopefully work with smart business developers and service designers, among others, who’ll of course stress this: Don’t add anything that doesn’t add value for your user. Just … don’t.)

The one question that becomes super important

Just as service providers today spend millions and millions to perfect the user flows of their web services and apps, millions will be spent tomorrow to ensure that the conversational experience is exactly where it needs to be. To keep users engaged, to drive them toward the finish line and to keep them coming back. Engagement, conversion, retention — the goals and metrics will be the same, the methods very different.

With this backdrop, we arrive at the question that intrigues me the most:

What is design when the interface is conversation?

Photo by Ben White on Unsplash

THREE ||| THE NEW DESIGN FIELD

CONVERSATION DESIGN AND THE NEW (BUT OLD) SKILLS

With the rise of conversational, the design field will undergo a huge shift. Think of the huge teams set up today to design web pages and apps. Think of their skills and processes. What will brands need to change or add to their teams to design conversations?

Oren Jacob (an ex-Pixar guy) is the CEO of Pullstring (makers of the Pullstring Platform, a ‘professional authoring environment’ for computer conversations), and he had an interesting take on conversational design in his talk at the Google developers conference I/O 2017:

“A way to thinking of designing conversations is to think of it as interactive screenwriting. But the trick is that we [developers] are responsible for lines 1, 3, 5 and 7, but not 2, 4, 6 and 8. Lines 2, 4, 6 and 8 are the ones sent back to us by the user. We have no control over those.”

After that, he compares this to GUI design, where designers and developers are in complete control of what inputs they enable. There are X places to click, Y pages to visit, a handful of actions to complete. And that’s it.

The possible inputs from spoken language, however, are almost infinite. Just think of all the ways in which you can say “I would like to order a pizza”. Then think of all the possible things you can come up with that you may want to ask the pizza joint virtual assistant. Aside from all the technical challenges that this scenario raises, it begs the question of how we go about designing these conversational experiences.

To design an interface that appears to conversate as well as any human, we need to pluck ideas from many different fields.

Grice’s 4 maxims of conversation

Coming from an academic background in language and linguistics, I’d like to fall back on the work of one of the most influential thinkers in the area of conversation analysis, Paul Grice (1913–1988). He is best known for having proposed the cooperative principle, a norm governing all cooperative interaction between humans. And as principles for specifically conversations, he laid out the accompanying four maxims of conversation. The maxims are not as much rules as they are guidelines to what we as humans expect from a partner when in conversation. The maxims are:

  1. Maxim of Quantity: Say no more, but also no less, than necessary.
  2. Maxim of Quality: Be truthful! Do not say what you believe to be false.
  3. Maxim of Relation: Stay on point, be relevant.
  4. Maxim of Manner: Be clear, avoid ambiguity or obscurity.

Users will definitely expect these same things even when they’re talking to an interface. If anything, demands when it comes to clarity (maxim 4) and quality (maxim 2) will be even higher. There actually may be a danger in letting users believe that they are talking to a supreme, all-knowing being. There will be errors, there will be misunderstandings, and the conversational interface will need to be able to handle it. To design an interface that appears to handle these things as well as any human, I believe we need to start plucking ideas from other fields.

Games and stories show the way

Beyond the maxims, the fields of game design and storytelling may turn out to be great inspirations for how to design conversations. They are already to some extent shaping service design.

At I/O 2017 Brad Abrams, group product manager for the Google Assistant Platform, mentioned those two as fields that Google look at and learn from now that they start designing conversational experiences. From the field of narrative design, we find inspiration for storytelling, dialogue and characters. And from interactive design (such as games, mobile/web app design), we can learn about engagement and retention. These fields are interesting since conversational will be about keeping the user engaged in an experience where they are free to act in many different ways (where we can learn from games and interactive experiences). And, it will be about turning these virtual assistants into characters that we love to interact with (where we can learn from narrative design).

Games truly possess the power to engage us. Photo by Yanni Panesa on Unsplash.

The challenge of an interface where you can do anything

Let’s look start with the simple and intuitive part. Potentially, this part could be easy to solve if we can design interfaces that are natural. If I can expect a virtual assistant to conversate according to Grice’s maxims, I will find the interface easy to use since I’m going by rules I’ve been trained in every day since I was born. On the other hand, this part has the potential to become very tricky, since the user’s options in a conversational interface will appear almost unlimited. I mean, what can’t you do or say in a conversation? And how will the interface be programmed when anything can be said to it?

It might not be as simple as following Grice’s maxims. The maxims work because they are about humans and humans follow the implicit social rules. We’ll train the virtual assistants to do so as well (Siri will never ask when you last had sex or something face-threatening), but what about the user’s behaviour when he is talking to a computer? How will it change? There will be no social context to adhere to. Siri will never be offended unless we program her to be.

So, how do we limit the user’s options, but make them feel free to say anything in anyway they want? How do we make the users feel the structure of the conversation, without having it visualized?

This is why we need to look at games. Games basically have this same problem; in an open-world game the options will initially appear unlimited. But such games manage to charade as open playgrounds where you feel you can do “anything”, even though there’s almost nothing you can do besides the limited actions the developers have put in front of you.

Zelda: Breath of the Wild — one of those free-roaming game experiences. Image: Nintendo.

The trick in many game designs is to make all those boundaries invisible, make the player feel she is going about the challenge in her way — even if it secretly is exactly the way the game wants her to go. A great game presents a sandbox filled with those rules and boundaries, but most players aren’t bothered by them since their attention is drawn towards the thing the game designers want the player’s attention drawn to. And when the player does run into a boundary, most often it is masquerading as something else. Say, a guard in front of the castle gates keeping anyone from entering, or a colleague reminding you to look for clues inside the building as soon as you try to leave it. Developers usually find a way to keep the boundaries in context.

Thinking of conversational design in the same way, we need Columbus to appear to be ready for me to say anything to him, while he is actually nudging me in the right direction. Think of it as the spoken equivalent of CTA:s and menus. So instead of saying:

“Hi Samuel, I’m Columbus. What would you like to do?”

… he might go something like:

“Hi Samuel, I’m Columbus. Where are you looking to travel?”

If I then say “I don’t know” or just remain silent, he should give me a rundown of what’s new, what’s hot, what matches the data he has about my behaviour and previous use of AirBnB. His suggestions will constitute the “conversational homepage”.

The Narrative Bumper Pool — a tool for visualizing conversations?

An interesting concept here that I’ll certainly want to look more at, is the narrative bumper pool, an idea of how to keep a free-form narrative on track. This awesome term and concept comes from Tracy Hickman, a game designer (among other things) who’s also into tabletop roleplaying games and has written the guide “Extreme Dungeon Mastery” where he explains this concept.

A roleplaying game is seemingly without boundaries, as I spoke of above. The players are free to explore and do whatever, but the different narrative pieces they come upon ‘bump’ them back towards the different goals the dungeon master (i.e. the designer) has in mind. The A at the top is the starting touch point of the narrative, a conversation in our case. From there it branches out to different options and the path to the end can look wildly different between players/users. Some pieces of narrative that the player comes upon open up the story and gives great freedom of choice, but some pieces — the ones at the edges — bump the player back toward the intended path.

My (sloppy) interpretation of Hickman’s bumper pool.

Carrie Patel, narrative designer at game company Obsidian Entertainment, spoke further on the topic of writing for non-linear situations such as this, in an episode of the writing podcast Writing Excuses:

“We look for ways to branch [the narrative], but to branch smartly. Intelligently, to funnel. We try to get the player to a number of end-points that will feel good. We anticipate the reasonable things they might want to do without just letting it go all over the place.”

How does this differ from designing an app or a web-page with a GUI? A GUI also allows for the user to choose his or her journey through the experience, even though there are a couple of end-goals that the design is funneling towards. Well, in a GUI all possible options are always visible. And the shiny checkout button is always there at the top, waiting for you to heed the call to action.

The conversational interface exists in the moment, and the journey is linear. There are no back buttons, no CTA that the virtual assistant keeps repeating with every dialogue line. And, again, in a GUI we as designers are in control of every possible input. In a conversation, where we can only anticipate the reasonable replies, as Patel spoke of above, the bumper pool becomes a much more useful illustration of how we should think.

Here’s what I think is really interesting: The bumpers will be the most interesting pieces to design and write for — not the pieces in the main path. If the user stays in the predictable central flow of the interface, then we as designers have an easy time. When the user strays and starts asking questions all over the place, perhaps touching the edges of what the assistant is capable of, that’s where design gets interesting. The bumpers at the edges determine whether we can keep the user engaged or if she runs into boundaries so frustrating that the illusion breaks and she gives up.

That box … it’s … IMPOSSIBLE! Dropbox’ 404 is simple but smart.

So when I ask “Columbus, why do I have a headache?” or some other random question he can’t answer, I’ll be at the very edge. I’ll have run into the “conversational 404” so to speak. Only, we won’t accept running into the same error message over and over again, as we do on a web page. Instead of 404:s we’ll be looking at “contextual 404:s”, more of a standstill in the conversation. I keep asking for something Columbus can’t deliver; he wants to push me back on track. Maybe Columbus’ reply will be something like:

“You know I’m no doctor Samuel, do you want to switch over to a health assistant or should we keep looking for a place to stay in Prague?”

It’ll be about bumping and nudging without steering too much. In a live conversation with a customer service representative over the phone, we run into these 404:s all the time, but the representative hopefully never replies “I don’t know” and stops there (certainly not several times over). Instead he’d say “I don’t know” and give suggestions in order to push the conversation back on track. Virtual assistants need to be designed to do that as well.

(Additionally, I believe the edges of the bumper pool will also be where the best character moments happen, but we’ll get to that later.)

So while we’ll need a new term for these contextual 404:s, the standstills, the concept will be familiar. Once I’ve started to think about it though, I’ve begun to realize how many of the tactics designers and marketers have developed over the past 20 years that will come into play in conversational, only in new ways.

What’s a pop-up in a conversational interface, for instance? Maybe Columbus will just plug in a short commercial message while I’m idle in the conversation. And what will ads be like? Will we tolerate to listen to a 30 second commercial every time we talk to Columbus, or will that completely kill user engagement? Will users tolerate being sold to by a virtual assistant, or will we be as resistant to pushy sales tactics as we are when talking to real people?

Here, again, I believe human to be the key word. If Columbus was an actual person, a friend of mine even, he wouldn’t be afraid to recommend a product to me if he knew it was relevant. Friends promote stuff to you all the time because they know you’re interested, know they’re being relevant. So ads might thrive — as long as they’re hyper-relevant.

More interactions — but easier to use. And that’s the key.

So you might look at the pool above and realise that for many interfaces, the pool will be huge. And the user will almost never follow the main path; he’ll stray, go back and forth. Doesn’t this mean a conversational interface equals a whole lot of talking?

Yes, a conversational interface will require a higher number of commands via voice as compared to taps in an app. Some say that is a problem, since we as users want less interactions, which signals easy of use. I would argue it does not. If the interactions come naturally to us, with no learning curve, that is ease of use.

The other day I tried to teach the interface of the Sonos app to my girlfriend. I find the app easy to use — she does not. She’s completely uninterested in tech and just wants it to work. And for someone like that, there are far too many menus, options and sliders in the Sonos interface. So I stay the Sonos DJ of our household.

With conversational, she’ll join me as DJ. She’ll just say “Sonos, play some Lana del Rey” and it’s done. She doesn’t need to know how to navigate the different menus, there are no other options to distract her from the task she want’s completed. That’s because she can use an input where she’s just as skilled as I am: talking.

This example shows exactly how profound changes conversational brings about. It completely levels the playing field and asks anyone that can talk to join in. Des Traynor, co-founder and chief strategy officer at Intercom, wrote on this exact topic in his excellent piece “Your product is already obsolete”:

“The most common criticism you’ll hear along these lines is, ‘Well, look at this. Through conversational UI it takes 73 taps to order a pizza.’ It does. That is correct. And yes, you can order a Domino’s pizza within five taps using their app. I would 100% agree. But that’s not the right way to think about usability.”

“It’s not, can everything be done in a button? You should be asking, can everything be learned and discovered quickly by people who want to do it? Not everything is worth learning. Your product can either use a UI, which your users don’t need to learn because they already know it, or you can ask them to learn a new one.”

My girlfriend does not want to learn a new one for Sonos and that keeps her from becoming a user. How many more services are today keeping people out because they don’t want to learn a new interface? 73 interactions instead of 5 taps doesn’t matter if you don’t want to learn the GUI to find the 5 right taps to play Lana del Rey. With conversational available, every single GUI needs to be questioned: Is it really needed? What can be done via conversation instead? Sure, keep on with your classic app with a beautiful, multi-faceted interface. The day a competitor offers the functionality through conversational, you’re in a bad spot. Most GUI:s exist to help the user accomplish something, not because using them have inherent value (in that case it’s more of a game).

Once functionality is there, we need to add the delight. And that’s where conversational becomes truly interesting.

Photo by Joseph Chan on Unsplash

FOUR |||| BRINGING CONVERSATIONS TO LIFE

FROM ALIEN COMPUTER TO LIKEABLE CHARACTER

I’ve several times over underscored how conversational interfaces need to feel human. And that necessitates character. If there’s no personality behind the voice talking to us, it won’t be human. Sure, it’ll be functional, but simply not delightful to use. To make the assistants feel human, they will need opinions, wants, quirks, their own style and tone of voice.

Memories is one such human trait. The memories of a virtual assistant consists of — besides the entire Internet, of course — user data, of the kind that Amazon or Facebook already use to tailor their flows to each and every one of us. I prefer to speak of this data as memories because it emphasises the human aspect. With data-based memories, Columbus will read your mind like no salesperson has ever done before.

And we will probably expect him to; people remember stuff, right? And this is no stupid website or app, he’s Columbus, I spoke to him just yesterday! The apparent human-ness of these interfaces will make us demand such things. Chris Messina, Developer Experience Lead at Uber, spoke in an interview with Intercom about us ‘living in threads’ where we expect brands to remember’ what occurred last in our ‘thread’, regardless of interface:

“With the move to the conversational paradigm for interfaces, we start to live in threads. […] Being able to move between channels and persist my preferences is going to become incredibly important in differentiating. Again, the privacy questions are super interesting and completely unanswered, but people are going to increasingly desire that, because that’s what we desire of our friends. Every time I talk to a friend of mine, I’m not telling them my life story over and over again; they actually have persistence of memory. Why don’t brands have the same persistence of memory so they can serve me better when I’m interacting with them?”

Again, Messina touches on the basic maxims I’ve gone over: We’ll expect of a conversational interface what we expect of other people. If Siri can start conversating like a real life friend would — she understands my way of speaking, she remembers what I know, she has a clear purpose with what she wants to say but she isn’t afraid to go off-topic for a bit — then it will also be intuitive to talk to her.

Can that be achieved? It’ll be a challenge but I strongly believe it’s where we’re headed with the technology developments I went over in part 1. The combination of the continued hyper-personalization of services and conversational interfaces, pave the way for unprecedented customer experiences. Being sold to will almost be fun since it’ll be more like a friend recommending you stuff he or she knows you’d love (oh how marketers of the world would love for me to be right on this one).

Storytelling is the new black — again

The skills of storytelling have long been held in high regard by great brands and the agencies that work for them. And the need for virtual assistants to feel human will not change that. Rather, people skilled with narratives will be in even higher demand. Google have begun hiring writers with background in playwriting, screenwriting or interactive fiction for their Google Assistant team. And Microsoft have a novelist and a poet on their Cortana writing team.

Why? Because great storytellers know what makes great characters. And the assistants will need to be great characters in order for them to engage users.

First, let’s stop at why that is true. Why does Columbus need to feel like a character, rather than a talking webpage, receiving commands? Well, first off because it’s called a conversational interface, and going back to Grice’s maxims, we have expectations on conversations, the most basic one being that the other part is a human. A person, with beliefs, opinions, emotions. Again: When in conversation with a computer, we will expect those same things. If they aren’t there, then we’ll project them onto the interface, just as people today say that Siri is “awkward” or “stiff”. She’s just without much of a personality — and less inspiring to use because of it. No personality is also a personality, so to speak.

So if we talk to the computer, and the computer talks back in an intelligent way but completely devoid of any trace of humanity, it simply won’t be engaging to use the interface. It’ll be the equivalent of using a completely functional but ugly GUI. Think of a virtual assistant without character as this:

…while one with personality and attitude should feel like this:

Sure, the first one might fly … for a while. Just like some eye-murdering web-pages were around in 1999. A couple of years into the era of web-pages, however, we learned to craft websites that were functional and pleasant to use (but we’re still getting better, of course). And as soon as conversational interfaces that fall into that second category start emerging, what happens to the just functional ones? Yeah, it’s goodbye.

Oren Jacobs again, speaking at the Google developer conference I/O 2017:

“To disregard character in your conversational design is equivalent to saying that you will let your users choose the colors and button locations of your graphical interface.”

We actually already have evidence of character being measurably important. The before mentioned Brad Abrams of Google revealed that the conversational bots with the best user retention were the ones with the strongest personas.

So, if Columbus needs to be a great character, we need to pose the question that storytellers have been yapping about for centuries: What actually makes a great character?

I will not even presume to be able to teach something new in this field and instead just refer to hundreds of years of great storytelling in literature, film, tv, comics and games. From the people who’ve told all those stories, we’ve learned, among a plethora of things, that great characters:

  • have flaws
  • show emotions
  • have dreams, wants, and opinions
  • are capable but not all-powerful and all-knowing
  • speak in a distinct voice and have a personal style

The flaws and the not all-powerful parts I believe are of special importance. When we talk about AI in its different shapes and forms we tend to depict them as perfect and all-knowing. These virtual assistants I’m talking about will for a long time be far from perfect, so how they’re designed to handle their limits and failures will be key to how we accept those same limits and failures. Because we will run into trouble sooner or later, even if it’s just from us pushing the limits of what the virtual assistant can do.

“If the same character insists on telling the truth when a lie would save his life, then we sense that honesty is at the core of his nature.” — Robert McKee

Characters are made when they’re under pressure

Here we get back to the narrative bumper pool and it’s edges. That’s where the critical interactions happen, and also where the characters will ultimately be made. This is a fundamental storytelling idea, expressed very well by storytelling guru Robert McKee in his book “Story” (and referenced by Oren Jacobs):

“Choices made when nothing is at risk mean little. If a character chooses to tell the truth in a situation where telling a lie would gain him nothing, the choice is trivial, the moment expresses nothing.”

With a virtual assistant, this would be where I give a standard command — “Spotify, play Adele’s 25” — and Spotify simply does it. There is nothing special about the moment, there is no tension. It’s just routine and it doesn’t leave room for character.

Back to McKee:

“But if the same character insists on telling the truth when a lie would save his life, then we sense that honesty is at the core of his nature.”

The moment to display character will be at these moments where the interface is under pressure, pushed there by the user intently or not. Back to Oren Jacobs in his I/O 2017 talk:

“The choice you [as designer] make there in particular, in the areas that are out of scope for the action that you’re designing, are probably the most important point in design that communicates the tone, mood and style of what you’re trying to build and how you expect users in your audience to feel about the thing that you’re building.”

I find this very fascinating, and this reasoning actually reflects my reasoning when writing for a GUI today (which I wrote about here): the moments where a brand voice really comes through preferably shouldn’t be in the central flows that users go through often and by routine. They’re best placed in the less common flows, at the edges of the pool.

Hire people who can craft characters and tell stories

As I stated previously, conversational brings about changes that go deeper than changing processes or training your developers to work with new tools. Here, completely new skills will be required. It’s not fair to ask a visual designer to become a writer of great characters.

You will need to do that which I mentioned Google and Microsoft have begun doing: hire great writers. Look for linguists, who can work with your developers to write the rules for your conversational interface. Look for storytellers in any form, who can craft your character and make your brand truly come alive. Look for game designers, who can inspire your current design team to think in new ways when it comes to engaging and guiding the playe… sorry, the user through your experience.

And these people that you hire need to learn about branding as well. At least about your brand, why it is the way it is and what they’ll help craft when they work for you. Their input on your brand and on your design process will probably also be hugely inspirational and challenge the way you work, which in the end can only be a positive.

FIVE ||||| CLOSING THOUGHTS

TALKING ALIENS ALLOW US TO BE MORE HUMAN AGAIN

No matter how long it takes until conversational has its breakthrough moment, and no matter the exact form it’ll take, I believe the following to be 100 % true already: if you’re somehow involved with developing or communicating brands, you need to craft brands that you can be.

We live in the age of live. Of on-demand. Of raw and unedited. In this age, too many are still applying the same thinking to branding and business overall as they did 10–15 years ago. Back then, you could get away with carefully crafting a brand. It was done by a few skilled people behind closed walls, documented in a brand book.

To this day, this view of “branding” persists.

Problem is, put those 2005–style brand books in the hands of a communicator today, and when she sits down to act as that brand on social media, it doesn’t work. It’s paralyzing rather than instructing and inspiring. Think of all your employees as actors that need to be able to step into this character’s shoes and simply become the brand. Do you think Woody Allen or Ingemar Bergman handed their actors a 43-page brand book for each role?

Today, you need a brand that lives and breathes. That you can truly be. You must leave the careful crafting and planning behind, and do. The arrival of true conversational interfaces won’t change this, it only highlights the need. Conversations are here and now. They are uncensored. Sometimes they go off-rails, other times they’re focused. They are more about initiating and nurturing social relationships than they are about performing functions or reaching goals.

The conversational era of interfaces needs brands to become real. Or rather, they need to feel real. Because a brand is still an abstract construct, a figment of someone’s imagination. Then, as the brand becomes known, a figment of our collective imaginations.

Tyrion Lannister, one of the most beloved AND flawed characters of recent years. Photo: Duncan Hull on Flickr

The great characters of storytelling aren’t real. The Hermiones, Tyrion Lannisters, Claire Underwoods and Tony Sopranos. But they make us happy and sad, they engage us and make us think. So they feel real. And there is a philosophical argument to be made that if great stories and character make us feel that way, then they are more real to us than people we’ve never heard of in places we’ve never been, even if those people and places exist physically on this planet of ours.

So brands must become real. Nothing else is going to cut it in the years to come. That means more attention to the core of who you want to be and why, and thinking about how that may show through doing stuff. Writing, posting videos, personal contacts. And then just going out there and DOING it. Less rules, guidelines and meticulous crafting, more of feeling that something is right because it is coming from a place which is true for the brand.

Yes, this is the final headline

The aliens changed everything about our society.

They changed nothing about humans.

Conversations are our natural way of communicating, whether the purpose of the conversation is to start a war or make peace; to break up or to propose; to get stuff done or just socialize; to educate or to simply entertain. But since the alien arrival, we’ve sort of forgotten that. We’ve been mesmerized by the world that the screens have invited us to. And that world will continue to mesmerize us. Only, the screen won’t be in the center of it any more.

Conversational interfaces, more than any previous type of interface, centers on the needs of the users. Smartphones have made us addicted to small screens, to asocial behaviours and notifications. Conversational will partially relieve us of that need. We’ll be able to straighten our necks, look at each other, re-focus on the real world, on what’s in front of us. On each other.

So let’s talk.

--

--

Samuel Stenberg
language+brands/design

Designer and UX Writer @ Uppfatta, a branding and communications agency in Sweden.