Protecting Privacy in Voice-User Interfaces

Gandalf said it best: “Keep it secret, keep it safe.”

Published in

grandstudio

5 min readJul 30, 2018

One of the key components of any voice-user interface is, clearly, speech — as in, speaking aloud. So, naturally, a major issue that comes up with a widely-audible interface is privacy and how to protect it. Here are six factors to consider when thinking about privacy.

1. Asking for sensitive information

It might be obvious to think about who’s in earshot when designing for VUIs but context can be more surprising than you think. Depending on the kind of VUI you’re dealing with, the user could be initiating the conversation (in which case, they are likely — though not always — in a setting where they can hear the system, and the system, them) or the device might be initiating the conversation (as in the case of outbound IVR phone calls). If you are asking for sensitive information, for example a survey about citizenship status, health, or finances, you would want to create a way for users to get to a private place, even if they were the ones to initiate the conversation. One easy way to do this is to alert them to the private turn their conversation is about to take and design a waiting period for them to get somewhere more private without having to stop the conversation. They can then let the device know when they’re ready to continue.

2. Volume of the conversation

You will also want to think about the volume of the device and how it might be projected outward. For example, in a party scenario, when a smart speaker is playing music at an elevated volume and someone interrupts to ask a question that they don’t want the rest of the party to hear, the music stops and the device responds back in a similar volume as the music it was playing. So if I were to get close enough to the speaker at said party to quietly ask something like “where can I get condoms around here?” and it stops and yells out to the now-jarringly-quiet room “OK. Condoms can be found at the following locations: CVS on Main Street, Duane Reade on 5th Street, and Bob’s Corner on Willow Street. Would you like me to send that to your phone?” You can imagine the potential embarrassment and lack of desire to continue to use that device in the future.

3. Always on, always listening

Additionally, it’s important to think about the device listening and how that affects the idea of privacy. Most devices with wake words need to be constantly listening in order to hear the wake word and respond in a timely manner. How much privacy, then, are we allowing the user in their daily life if the device is constantly listening in? Of course, the companies running these devices have policies not to store the information not accompanied by the wake word, and of course, the information sent up to the cloud to check for the wake word should not be going elsewhere. However, the user-lead policies aren’t always explicit or intuitive and there seem to be frequent reports of issues, like the recent one where Amazon which inadvertently shared conversations of some users.

So the question remains: how much do users trust that their privacy is protected and no amount of security breach or hack will allow someone to listen into their home? Whether or not the case is made for a secure system, that building of trust amongst users is paramount to an engaging relationship with their devices. More importantly, there is an expectation of privacy within a user’s home or vehicle that we, as creators of these products, need to respect.

Sadly, not what cloud storage *actually* looks like

4. Conversation storage

Speaking of sending information to the cloud for interpretation and triggering responses, many users wonder about what happens with that information. Conversation is typically ethereal — once we utter our words, they are gone, never to reappear again later. But that’s the thing about VUI products — developers and designers need the recordings in order to ensure they’ve built the product appropriately. Often, the conversations with devices are, in fact, recorded and shared with the developers for the betterment of the product and conversation. And although one might think everyone would know this, this is not always made clear or explicit with users. It is an ethical consideration — which should be intentionally discussed and decided upon- as to whether or not to explicitly state that information and request permission from the user. (By the way, I’m a strong vote for yes — be explicit.)

5. Legal access to conversations

A recent question regarding these transcripts has also come up regarding the law and government. There have been legal cases where lawyers have requested access to the recordings or transcripts of home-based smart speakers to essentially use them as a third witness in case of he said/she said. It is legally undecided yet whether this information is a violation of a person’s privacy or if by installing it in their home, they have reasonably agreed to the possibility of a lack of privacy. Again, we are not necessarily explicitly informing users of this possibility. But we should: as product creators and developers, it is on us to create those explicit opt-ins and sharing of information.

6. Employee access to conversations

Lastly, something that we all hope is not happening but have to also be aware of is, aside from skill or product developers, there are employees for voice-enabled device companies who have access to the recordings and transcripts. As someone who has worked with said recordings and transcripts, I can say that at least in certain companies (and I would imagine most, if not, all) there are policies and trainings in place to ensure only those who absolutely require access to the information have it, and only for the purposes of creating or improving a product. That said, it exists. So it’s within the realm of possibility that someone with access could use the information for their own purposes — or help someone without access to do so.

So, should we embrace this technology or the tin foil hat?

Bringing up these points is not meant to fear-monger or scare anyone off from using or creating voice-first devices. There is so much good they can offer with connection and convenience. What’s important is that, as creators of this technology, we are aware of our user’s expectations of privacy and where we are potentially falling short of them. Voice should be an enhancement to our modern life, not a detriment. So let’s start the conversation — what else can we do to protect that privacy that other non-cloud-connected or voice-enabled products provide?