Conversations for Everyone: Designing Truly Accessible Voice User Interfaces

Just when you thought you’d nailed accessibility, a socially-culturally-linguistically-laden interface came into the mix.

When we talk about accessibility, most of us think about designing for physical differences — things like eyesight, hearing, or, in VUI design, how different peoples’ speech patterns will or will not be understood. Certainly, ensuring that peoples’ utterances are correctly parsed by the underlying technology and designing for the physical differences in users’ voices are a key component of accessibility in VUI design. (Bo Campbell has some great tips on how to design for these physical differences and Cathy Pearl has a whole book that covers a lot on this.)

But because of the cultural, linguistic, and social elements embedded in the very concept of a conversation, the accessibility issues to consider are subsequently broader.

Cultural considerations

Photo by Samuel Zeller on Unsplash

Accented voices

To start with, you might need to think about the accents of your users, particularly if you have a national product or a population with a mix of accents in the language for which you’re designing. This is both a physical accessibility issue and a conversational accessibility issue. The physical parsing of how the system understands a “yes” in a heavy Mississippi Southern accent versus how it parses it in a Midwestern accent determines who moves forward easily and who gets an error response to a question they appropriately answered. As a very low-stakes example, a colleague and I just requested Bon Iver on our smart speaker the other day. I pronounced it the way the artist is commonly pronounced, with a French accent, “bun ee-VAIR.” The system response was that it could not be found. My colleague thought for a moment, then cleverly asked for “bahn EYE-ver.” Lo and behold — it worked! But this brought up a great accessibility question around non-English words and names: who determines an acceptable pronunciation? The assumption the smart speaker designers made (which may indeed suit the majority of their users ) doesn’t accommodate other accents or pronunciations — and there aren’t any hints or help in the conversation for those who use a different accent.

It’s also possible that an accented answer may move someone forward in the conversation easily, but the accent gets misinterpreted and the person is sent down a path they didn’t choose. For these potentially higher-stakes situations, you might choose to design hidden grammars (or paths) that, while not explicit in the conversation as options, exist if someone says something like “I didn’t say that!” which allows them to return to the previous prompt in the conversation. You could also design error handling messages that operate more like clarifications than errors. For example, you might say something like “You said yes, right?” in lieu of the usual “I’m sorry I didn’t understand. Could you repeat that?” to prevent the speaker from feeling a failure on their part when it was the system who failed to parse the accent correctly.

Cultural references

Along with accents, it’s important to think about the cultural references you’re making. Are the references more pertinent to one socioeconomic group than another? I recently heard a speaker at a conference make the analogy “it’s like when your wife complains you forgot to charge the Tesla battery.” There’s a lot to unpack in that sentence, but the most glaring assumption is that everyone in the audience knew what it was like to own a luxury car. (We did not.)

I’ve also made this mistake myself when encouraging exercise in a health app and giving low-impact exercise examples like swimming and yoga. Both of these require money in some capacity as well as hold cultural biases, which I learned through user testing. The more accessible option was to encourage walking and at-home indoor exercises.

Mistakes like that can cause your user to feel like they are not included in the conversation you want to have with them. This can lead to feelings of alienation and mistrust. Particularly when your whole interface is the conversation, it’s important to consider how you’re presenting your information and interactions to make it welcoming for all of your users. (For a broader — and more GUI-inclusive perspective — check out Caio Braga’s article on designing for inclusion.)

Linguistic considerations

Photo by Anna Vander Stel on Unsplash

There’s also the issue of accessing a conversation in the words used. Since conversation is so heavily embedded with our own personal biases and perspectives, it’s important to be aware of those when crafting dialogs for the system and make sure they’re accessible to your audience. One way these biases come out are in colloquialisms.

Colloquialisms

Who doesn’t like a fun and friendly persona with their voice bot? (I do. I’m not a monster.) But the problem with these chatty, down-to-earth personas is that they sometimes use colloquialisms in their language, which, sorry guys, is not accessible to everyone. For example, a phrase like “take that off your plate” can be very confusing to someone whose first language is not English. (Leave my plate alone! I’m hungry!) Folksy phrases like “hoopla” or “doozy” can also be confusing to many people for whom those are not commonly-used phrases in their communities.

You’re better off phrasing things in more clear ways, and achieving tone and personality through other means. “I can take care of that for you” will leave people much more relieved that their food is not being stolen and able to focus on the rest of your bot’s sentence.

Social considerations

Photo by Kevin Bhagat on Unsplash

While we’re going into the deeper ways people feel things are accessible, the very idea of voice user interfaces is not necessarily accessible to everyone. There are communities of Americans who have been marginalized or mistreated by the government or society who may justifiably be very uncomfortable with the idea of a technology that constantly listens to them. Additionally, some people with conditions like schizophrenia might find a disembodied voice and implied surveillance terrifying. Assuming everyone will be comfortable using a technology that requires constant listening is a big assumption that we, as product creators, cannot make.

While a voice system itself may never be able to accommodate for these types of variances, it’s important to examine the wider system to ensure it’s accessible to everyone. For example, if you are creating a health service voice app that wants to include patients on Medicaid and private insurance, you need to consider how access to a voice device, or lack thereof, might affect quality of care across the spectrum. You also need to consider whether simply providing a voice device to those who don’t have one is going to solve your issue. Accessibility in this case may be more about creating more than one UI to accommodate all your users.

Ask the Questions

While this is certainly not a comprehensive list of accessibility, hopefully it points you in the right direction of elements to consider as you design. Some questions you can ask yourself and your team as you move through the design process are:

· Would someone who has never used a VUI product understand how to use this?

· What are the different ways people input and receive information from our system? How might that block people who can’t input or receive in that way?

· Would people in lower, middle, and upper financial brackets find this useful and easy?

· How might someone who doesn’t speak the same way I do interpret this sentence?

· What assumptions am I making about my listener in this prompt or design?

Asking yourselves questions like these can help make sure your designs and products are available and open to all of your users, not just some of them.