Thoughts on Conversational UIs

Published in

Design Warp

15 min readAug 10, 2017

After a number of interesting conversations I’ve had with various ThoughtWorks colleagues here are the topics discussed and some of my own thoughts on conversational user interfaces. (Just to be clear, even though probably the wrong use of CUIs, this article refers to voice interfaces such as Alexa, Siri etc.)

What exactly is a conversational user interface?

First, lets take a look at some of the recent wide-reaching releases:

October 2011: Apple introduced the iPhone 4S along with their beta version of Siri.
July 2012: Ok Google was first introduced in Android Jelly Bean and was first supported on the Galaxy Nexus smartphone.
April 2013: Cortana was demonstrated for their Windows phones at the Microsoft Developer Conference.
November 2014: Amazon announced Alexa along with the Echo and Dot hardware.

These are intelligent digital personal assistants and they operate in the form of a conversational user interface (CUI); the user speaks to them and the assistant replies with some information.

Are they useful at the moment?

Users speaking to devices and expecting some form of digital feedback goes further back than expected. However even now in 2017, 81 years after the first recorded implementation of the synthesised ‘voice’, we’re really not getting much use out of the technology on a daily basis.

It’s 2017 and we’re still not in a place where CUIs are really that useful to us other than booking simple trips, playing music, telling us the weather or regurgitating some other basic information we could easily find another way.

Artificial Intelligence can beat us at chess and Go, impressive, however I really can’t help but wonder when we’ll get to the level of conversation like in the film ‘Her’? How quickly will we get to that level? What real customer facing market changing advances have really been made since the introduction of Siri in 2011? Why has it not progressed as quickly as we expected it to?

In a fantastic article Daniel Eckler writes: “Until our technology is able to pick up on the subtle aspects of communication — including facial expression, tone of voice, and body language, text is the perfect vehicle for AI. It may not be as emotionally alluring as the conversations Joaquin Phoenix had with his OS in the film Her, but it’s a baby step in that direction.”

Some people reading this will try to explain that the current crop of CUIs are still proof of concepts and the public just aren’t ready for that level of AI in their lives. I’d argue the public has been ready since the first episode of Star Trek 1966 and, equally, scared since 1968 and Arthur C. Clarke’s ‘2001: A Space Odyssey’.

It seems we’re still taking those baby steps Daniel mentions in his article and there’s a number of reasons why.

Where’s the emotional attachment?

Now I know there’s all the statistics to show that app downloads per month are decreasing, even on my own phone I can’t remember the last time I downloaded an app that I didn’t specifically search for based on a particular need.

Even though that may be the case I actually have, and I don’t think it’s much of a stretch to assume others also have, a weird emotional bond to particular apps. Maybe they were once really useful, maybe there was a cost to download or maybe it’s simply the visual design and interaction of the app that’s enjoyable.

The example for me would be Clear App for note taking and list making. In my case that particular app has been surpassed by Gmail’s Inbox ‘reminders’ function in most cases and in others by the more traditional pen and paper.

Nonetheless I really do like that app; I like it’s interaction, colour themes, touch feedback, audio feedback and crucially… the icon design. The latter so much that it sits proudly on my homescreen, unopened and unused but still loved.

There is no more use for this app and yet I have a bond with it. With CUIs there’s no ‘physical’ app to grow attached to, there’s no visuals or branding to be drawn to or interaction to play with. I’m not saying this is a problem and maybe it’s better to not grow oddly attached to an app, but a CUI is invisible and for humans that means a relationship and bond will struggle to form and be difficult to maintain.

Also consider how business, startups and independent designers/developers will even be able to market their app to the public. How will they entice users to initially engage with the app, ensure there is continued use and not forget about the service it’s providing. I’m not saying it’s impossible — I’m saying it’s something that needs to be thoughtfully considered to ensure a CUI service is successful.

There is another side to this; the way we interact with services could become even more natural. Once users are comfortable with a personal assistant being present and in conversation with them they could even expect it to be there when it’s not, it has become ubiquitous in their lives. This has it’s own set of positive and negative considerations.

A box in a room

Now it could be said that there’s already an emotional attachment to something physical; the products out there right now such as Amazon Echo/Google Home/Apple Whatever are physical items expertly designed. They sit proudly next to all those other crafted aluminium devices in a customer’s household (and even that cheap plastic kettle bought at Argos).

Although these physical products are satisfying to observe and touch it’s the invisible CUIs that need to work hard to be truly successful. They have to become part of our lives; imagine the Star Trek scenarios, it has to be in our TVs, integrated into our homes, cars, workplaces, our fridges, kettles and toasters. It’s no good to just have it as a stand alone box sat in the corner of the room.

Now, I know the people building these devices know that; they have to start somewhere and get people used to interacting in this way. Very quickly the reality will be that once proven as a concept the CUI will be literally everywhere and that will bring it’s own ethical, moral and regulatory complexities.

All of this will be very dependent on the product design and development teams to both understand and manage. The role of the designers, analysts and developers on these teams is going to change dramatically.

Edge cases are multiplied

On the more ‘traditional’ interfaces such as desktops, mobiles and tablets there are core user experiences that are more easily designed for and those slightly more interesting/different/surprising edge cases that occur due to the quirks of human life.

With CUIs we are talking about humans and their conversations; edge cases are not only going to be on the edges anymore they’re going to be upfront and centre. Humans are weird, their conversations and the way they have them are even weirder.

The way people talk to each other, not just their own language and pronunciation but the actual experience of conversational dialogue is vast and varying. Those conversations are now going to be happening with robots, directly or indirectly, and if it’s not engaging then the whole experience will feel unnatural and, pun intended, robotic.

In some cases users may even find themselves restructuring their natural spoken sentences to ones that the CUI understands, deviating away from normal human conversational language even further. That is not what customers want to do and it’s not going to make a CUI service successful.

People are quirky, very quirky

On the subject of quirky edge cases, there’s still going to be a number of random interactions forced on a CUI by the user. On a physical device those interactions are personal and quickly executed, they might be to satisfy some need or requirement or in a lot of cases for no particular reason except to satisfy the user.

An example would be the way my mum consumes the current week’s weather. She’s just moved abroad and not only will she check her current location’s weather, but her hometown weather, her favourite holiday destination weather, my local weather and the weather of another memorable holiday destination. All of these are in different countries, let alone cities.

That is certainly a bit of an edge case, one that can be very quickly executed on her mobile phone and in private. The information gathered can be used later for conversations about the weather (she’s British…). Now imagine doing that using a CUI on a device; it’s awkward, time consuming, frustrating and quite possibly embarrassing.

What about the Finnish metal bands?

How would you ask Spotify to play Roi Alekpehanhou’s album Sato Na Hangna? Or maybe your favourite song — Kwang Noi Chaolay? How about a Finnish metal band, maybe Teräsbetoni or Rytmihäiriö?

Now unless the speaker were from Benin, Thailand or Finland respectively, Spotify is really going to struggle to understand whats being asking for. Accents are one thing, pronunciation is another.

User: “Simon, play … ‘Saatoo-nah-hang-nah by Roy-a-lec-pea-hann-who’…”
Simon: “Ok, now playing ‘The Who’!”
User: “…”

They will get better at understanding the words that are being said but if users just have no clue how to pronounce these words or names then they’ll have to fall back to just playing a genre of music or not using the CUI at all.

This could force us to live in a homogenised society lacking in not only musical diversity but diversity of all kinds. Users can only ask questions or information in the language they can pronounce effectively, it results in less Thai music in the household but plenty of Ed Sheeran.

Context is king

The key to a successful CUI service is the context that it can understand and continue to learn. Using the weather example above, and the way my mother uses it, the CUI needs to know that she wants to know more than just her current location. Some setup may be involved but it has to be efficient, if it’s not then she’ll revert back to her mobile which provides visual clues and prompts to help her.

If the CUI can’t learn about it’s user then what is the point of the service at all? Let me be clear, I’m not talking about just Alexa, or Siri, I’m talking about the apps built to make those platforms richer. It’s those apps that need to learn what users need and want based only on the data it’s managed to aggregate from the user’s profile and what they’ve said in past conversations.

It can’t just be a pulling motion from the device to the user, the device also has to push to us. My personal assistant should wish me good morning, probably quietly, until I respond and then it should provide me with some further information. I shouldn’t have to ask what the weather is or what’s in my calendar. It should know that I want to know, before I even need to ask.

Cyril: “Good morning Chris”
User: “Hey”
Cyril: “You’re heading to Manchester today, it’s going to rain you know!”
User: “Uh, ok ok, remind me to take the umbrella before I leave.”

The success or failure of the hardware and it’s operating system (OS) will be down to the third parties that supply intelligent apps and useful services to their platform, there also needs to be consideration given to the implications of privacy and data protection in this case.

Are they actually any faster?

I’ve heard and read that interacting with a CUI is much faster than looking for the information by physically using the service via a traditional digital device.

Maybe it’s faster to ask a personal assistant the weather than to find my phone, unlock it, access the App Store, search for a weather app, choose a weather app, enter the password, download, install, figure out how to use, find my city, save my city and view the weather.

However, I need only do that once and if I’m happy with the native weather app most of that interaction is not needed. The key part of the interaction is how quickly I can consume the information and with a quick glance at a visual interface I know not only the weather for the rest of the day but the average weather for the rest of the week.

A CUI would have to take time to explain everything every hour and every day, that does not seem faster to me.

The key advantage is that I can continue with whatever else I’m doing whilst asking for a basic weather forecast, it’s not making it faster but it does allow for less disruption in my daily life while I get enough of the information I need to satisfy me.

CUIs will cover most of the ‘standard’ use cases efficiently; getting an account balance, reserving a regularly visited hotel or booking a train ticket. Users want digital efficiency and effective digitisation is still a core challenge for businesses, successful CUIs could play a large part of that.

Invisible discoverability

CUIs are invisible, I won’t know what I want until I need it and it should be very easy to get what I need. Equally I don’t want to have to use a mobile app to install anything, the personal assistant should handle all that.

How is it going to be able to read through even the top three apps and give me relevant information for me to make a decision in the same time it would take me to choose a mobile app? The latter I can do by recognising branding and viewing screens of what the experience may be to help me decide quickly.

An interesting way to approach this would be if the personal assistant knew, based on my calendar entries or other data, that I was going to Italy next week. Now I could download an app myself but it would be so much more immersive and quite impressive if the device would suggest that I download a Rome city guide.

Valerie: “So you’re going to Rome next week, shall I tell you some interesting things to do?”
User: “Sure ok.”
Valerie: “Which guide would be best, Lonely Planet or Trip Advisor?”
User: “The first one.”
Valerie: ‘Ok great! So looks like the Colosseum is interesting, although it will be busy from 11am onwards so best to get their early. Thursday would be a good day as the temperature won’t be too hot.”
User: “Sounds good, add it to my calendar.”

I’d fully expect the Rome city guide, a language learning app and maybe a transport guide to be auto-downloaded and progressively disclosed until the date of my return, then deleted and removed from the device until the next similar calendar entry.

Maybe I’ll be going to Rome again in the future, if that’s the case then it should remember my last visit, where I went, what I thought of those places and what other people also liked, we should be having a conversation about them so it knows what to suggest next.

Valerie: “Ah you’re going to Rome again! Did you enjoy the Colosseum?”
User: “Yeah was great!”
Valerie: “This time you could visit the Roman Forum, did you see already?”
User: “I did actually, it was good.”
Valerie: “Ok cool, how about the Vatican Museums?”

The point is that I shouldn’t need to be involved in the decision making here, a service like this does not have customer facing integrated apps to search, download and install, it just asks me if it is allowed to ‘learn’ something and then gives me back what it’s learnt.

Now we’re moving into the complex field of machine learning and that ability is critical because what is a personal assistant without the ability to learn and change based on the user’s input?

What I want is my personal assistant to push services to me, allowing me to then decide to download and use that app or move to the next one in the list. Which one comes top of the list? Well… I’m sure Amazon and the others have a monetised advertising plan for that.

Minimum viable interactions

It’s not going to be immediately obvious what each app does, there is no description to read, no visual on-boarding, no tours or examples. The learning curve of new apps and interactions will be really difficult to handle without the support of a graphical interface. There’s going to have to be some buy in from users to explore the app without any visual aid or cumbersome audio tour.

There has to be a basic interaction and it has to be worthwhile and useful. It just has to work, like a digital MVP it has to provide enough value to the user right out of the box for them to continue using the service.

As mentioned above, if we move away from ‘apps to be installed’ to ‘things a CUI can learn’ the minimum viable interaction of a service is going to be hidden. It can get soaked up into the capabilities of the OS so customers will never notice the ongoing updates and optimisations of an app, they should just generally feel the service provided has improved and has learnt more about them.

Tone Of Voice is key

An important question for all designers/marketers/writers is going to be: ‘How can a company bring to life it’s brand by conversation?’ No longer is this a one way dialogue with just voice such as radio, nor visual and voice such as TV, nor the majority one way communication tool that is the web, equally it’s also not the two way dialogue such as a chatbot, or customer service live chat.

Tone of voice strategies, guidelines and patterns have always been important for successful marketing and customer interaction, just now it’s very much different. This is something new and content writers, experience designers and developers are going to have to work very closely together to make the ‘conversational experience’ seamless.

Writing will take more time than design and development

Designing and building a basic CUI isn’t really that difficult; much of the language interface comes out of the box. Of course machine learning and using any kind of large scale aggregated data that allows the CUI to understand the context of the conversation adds much more technically complexity.

However I think it’s fair to say that the amount of time it takes to design and build a well thought out and immersive CUI experience will require at least the same, or most likely double, the amount of time to write for that experience. Conversations need to be fluid, have the right personality and amount of it, they should have just enough to say to be useful without saying too much to become cumbersome.

Success for the user is no branding whatsoever

When really considering what the best experience for users is, to me at least, it seems that little to no branding would be the most favourable. In contradiction to my last point, do users really want a different tone of voice for every different service being provided? There’s much more opportunity for the trust of users to grow when one holistic personality over multiple is being used.

If a user purchases a personal assistant then they want that personality, just with an array of deep ‘intelligence’ provided by third party services. As mentioned previously in this article, it would be better for the personal assistant to learn what a user wants information about rather than the user having to preinstall another app with a different personality and invisible interface to learn.

This is going to cause a real headache for businesses and again could raise ethical and moral questions. For example, The Guardian and the Daily Mail are both news platforms, however they have very different standpoints and views on world events. Somehow the CUI is going to have to suggest which of these to install but even that seems cumbersome, I just want the news — the kind of news I like already.

With no branding and a holistic personality how can users really tell the difference other than the actual content? Did I hear fake news? A user’s critical thinking is going to become even more important, now more than ever, and it’s something that has been lacking in this new age of social media and alternative facts.

How about multiple personalities?

There is also an opportunity here to give the ability to users to create more ‘characters’ to help them relate to various services and therefore businesses. The character may be the touch point; added to the contacts list in a messenger or mailbox and called upon to perform certain specific tasks.

User: “Steve can you book a train ticket to Munich tomorrow morning for me.”
User: “Narinda what’s going on in the news today?”
User: “Cyril, what am I doing tonight? Is it going to rain?”

When using multiple personalities in this way users can have them as a virtual circle of ‘friends’ in their contacts list, each of them being called upon for specific jobs. Of course, they would have to be available on all relevant platforms on the OS to be truly useful and engaging and that’s where full integration of the third party apps becomes important.

Like traditional branding but in a new and different way. CUI branding is about personalities, whether that’s one holistic OS level personality or multiple ‘friends’ that complete specific tasks.

Final thoughts

There needs to be a lot of consideration given to whether the personal assistants and CUIs that we design should have female names, voices or personalities. Always defaulting to a female personification can fuel the fire of institutional sexism that we are trying to put out; give users the option to change the persona during onboarding.

It would seem that the base OS being supplied by companies like Amazon, Google and Apple have all the power at the moment. Even though they will require those third parties to provide the great experiences for their platforms to learn, be useful and ultimately successful.

The ironic thing is that these platforms are reliant on more apps and services to be built for them and the more that is built the more powerful they will become as the third parties services are soaked up into the overall intelligence of the platform.

It’s an exciting, yet scary, place to be.

Thanks
The following people were all involved in various conversations on this topic and contributed in various ways to this article so I’d like to thank them, in the order of the conversations we had:

Anthony Scatchell — ThoughtWorks San Fransisco
Axel Knauf — ThoughtWorks Cologne
Liam Hutchinson — ThoughtWorks Manchester

I’m a Product Enablement & Ops Consultant with over a decade working in technology organisations, enabling product leadership to maximise their performance and impact.

Visit chriscompston.com to find out more.