Potential Pitfalls for Voice Interface Adoption

Vaishnavi Chityala

Published in

Alter potions

7 min readNov 5, 2023

How do designers model voices to address those in the age of AI?

“We need to stop optimizing for the machine age and start designing for the human age.” — Jared M. Spool

In the ever-evolving world of technology, the emergence of voice interfaces has brought us closer to the realization of a more natural and human-like interaction with machines. It is well witnessed how transformative the power of voice assistants is, and it is fascinating to see how these technologies have become an integral part of our daily lives.

However, as our reliance on voice interfaces grows, like Jared's perspective, the perpetual need for user-centric design is more than ever. In this article, I will talk about the pitfalls that could impact the user experience of the voice interface and the VUI design approaches formulated to address such concerns in a commercial setting.

What are Voice/Conversational User Interfaces?

A user interacts with the smart device, which has primary Input/output methods as voice with no-to-minimal visual assistance. The potential goal is to mimic human-to-human interactions, essentially with varying conversation interaction styles — system-centric, content-centric, conversation-centric, and visual-centric, each serving a different purpose in a specific context.

Elaboration on these conversation styles will be a discussion for another blog…

Now on the high level, the major themes of concerns and limitations with VUI/CUI are:

Privacy and Trust
Access and Accessibility
Ethical Considerations (for instance — Will introducing a CUI, do more harm than good?)
Potential Impact on Users

For the scope of this article, I will deep-dive into the research-study insights that shed light on privacy and trust.

User Perception/behavior

Non-users are unaware until they become users

The user’s hands-free voice control provides them with the convince of just uttering the “wake-up” word. But, more often than not, the users feel constantly being heard due to the continuously listening microphones' “always-on” feature. Certain users are willing to trade their privacy for convenience. There’s more to it — awareness and early-adoptor trust built with that company propel the user towards adoption. However, due to popular data security concerns prevailing with freeware, users draw parallels leading to ‘for/against adopting’ smart devices. Some of the factors at play are complicated trust relationships with the device companies, incomplete understanding or inaccessibility of the privacy risks, and lack of alignment with the user needs (such as — utility of the device while sitting in their interpersonal spaces — office/home).

To quote a personal example, it's been more than a couple of years since I owned Siri and Alexa. Based on the primary and desk research— Apple’s consideration for privacy and security concerns is evident for many of its users. Moreover, I never, personally, came across Ad targeting and profiling in its ecosystem. Whereas, while using Alexa, I did notice there were certain suggestions and recommendations surfacing on third-party apps, which led to my disengagement with the usage of the device.

What does human-bot interaction mean to most of the users?

Users’ perception of such interactions is that these tools serve a specific purpose or need, as opposed to being a social companion. In spite of the bot being personified/anthropomorphized, the user’s behavior and considerations towards these devices can largely vary. Those could be associated with the duration of use, the location of the voice assistant being used and its perceived presence, and the moment of interaction with the bot itself.

For instance, a user with a definitive understanding of the assistant’s purpose refrains from including polite phrases and treats it as a machine by sticking to specific commands. In contrast, older adults might include “please” and “thank you,” displaying some level of emotional attachment to the bot.

Conversational Analysts, security researchers, and many others have proposed guidelines for the aforementioned issues. These considerations could be the best bet for niche domains such as healthcare, and fintech, and for businesses with huge demographics comprising of children, older adults, and users with special needs. The following themes of solution are related to the conversational analysis of the interaction and the user-centricity itself.

Conversation-sensitive design guideline

There are industry-developed guidelines for designing the “talking”, each concentrating on a distinct set of principles for designers, Google’s Conversation Design guidelines propel designers to “craft conversations that are natural and intuitive for users” and that “conversation design is about the flow of the conversation and its underlying logic”. Further, there are principles formulated by Amazon and IBM — Alexa Design Guide and IBM Conversational UX guidelines.

In specific, IBM stipulated three principles of VUI design: recipient design (design for this user by creating adaptive scripts that react to the conversation thus far), minimization (design for minimal use of talk), and repair (design to allow people to fix interactional troubles–which are recurrent in human-human talk–with ease).

Ensuring the Progressivity…Aligning with IBM guidelines, one of the principled approaches suggested within the CUI community is “Response Design”. Before that, let's look at the terminology and anatomy of any conversation within this context. Since it is not the same as the human-human conversation…

Within the elements of “Speech exchange” that takes place between two parties, the exchange is oriented with the topic and the structure of the conversation. When looking at the anatomy of any modeled conversation, there are three components —

Turn Conversational Unit: units signifying that the current speaker’s utterance is complete and speaker transition is possible
Sequences: pairs of actions/adjacency pairs
Activities: series of related sequences

Anatomy of any modeled conversation, there are three components

Any conversation varies in a number of turns — two turns to multi-turn models depending on the conversation style (Content-centric, conversation-centric, and so forth). Like task completion within the GUI context, ensuring the progressivity towards the completion of the interactional sequences is crucial.

Now, what is Response Design? In a conversation, the initial responses that come after the greeting sequence within the interaction are crucial for moving the conversation forward. It usually takes at least three turns for a user to assess the response and continue. The response provided by the device in the previous turn influences the user’s next actions. Therefore, designing the response in a way that helps users accomplish their tasks is a fundamental consideration. In simpler terms, response design can either aid or impede the user’s progress.

In response design, it suggests that — whenever the conversation modeling is done, it needs to be approached with the following five questions. The designers may ask of their designed responses to reflect on how that design takes progressivity into account.

Could this response be delivered minimally, allowing users to progress to their next move earlier?
Does this response support or impede the user to be sure of the VUI’s ‘understanding’ of spoken talk?
Could other resources provide users something to move on with help, e.g. accounts of what went wrong?
How could the user provide more information to this response than expected?
How do users themselves work to support, and halt, the progress of a sequence in response to the VUI, either in overlap with or following the VUI’s response?

Every formulated guideline, reinforces the fact in ensuring user-centricity, signifying the necessity of Jacob’s heuristics — not just implying for the GUIs and the user engagement. To add more, among the three principles, “repair”, meaning — supporting the user, should be the potential way to build the user relationship.

Make the “talking” natural and non-intrusive

A natural human-bot interaction, in my opinion, will look like — CUI along with handling conversational flow, has the ability to recover the error situation seamlessly. Personally, some instances annoy me when Siri does not understand my query and then dismisses the whole conversation flow. Plus, there is no feedback visually or aurally.

Secondly, with help and documentation discoverable to the user and giving the user control to change any default options creates an impact on the user’s experience and the sense of freedom.

An interesting point to be noted, Siri not only wakes up with the “Hey Siri” utterance but also with utterances containing sensitive keywords such as “Emergency,” “Suicide,” “Killed,” and so on, providing a response to connect with 911/local emergencies. Unaware of this fact, when Siri got activated while we were in deep conversations, it startled all of us. This led to browsing of settings/options to unveil the reasons for its activation randomly. Turns out that there is a “Sensitive Content Warning” feature, which was, at that time, turned on. Such features with options really add value to the user. I assume it’s not just me, but no one finds Siri utilitarian when it interrupts them in the middle of conversations.

Conclusion

As we witness the AI-driven transformation of industries such as healthcare and fintech, it is crucial to recognize the pivotal role of user-centric design in enhancing the user experience with voices.

Designers and developers must embrace empathetic design principles and adopt progressive disclosure to make voice interfaces more human-like. By doing so, we can break down barriers, connect with users on a deeper level, and ultimately create technology that not only serves our needs but also understands and respects our emotions.

Alright! That is all.

I hope you find this useful even if you already know all this.. If you find the content improvement, I am open to listening. Feel free to share your thoughts below, I’m always open to a healthy discussion.

Thanks for reading!

If you feel like talking, connecting, or just want to see what I’m up to, and you can also follow me on LinkedIn, Twitter, and Instagram. Further Checkout my casestudies

References:
[1] Proceedings of the ACM on Human-Computer Interaction Volume 3Issue CSCWArticle No.: 214pp 1–21https://doi.org/10.1145/3359316

[2] “Emotional Design” by Don Norman

[3] https://www.theverge.com/2022/4/28/23047026/amazon-alexa-voice-data-targeted-ads-research-report

[4] CUI ’19: Proceedings of the 1st International Conference on Conversational User InterfacesAugust 2019Article No.: 26Pages 1–8https://doi.org/10.1145/3342775.3342788