‘Alexa, stop!’ The trials and tribulations of testing effectively for voice

The British Red Cross wants as many people as possible to be confident to help someone in a first aid emergency. As part of this, we have explored innovative ways to provide first aid education to the public. One of these is through an Amazon Alexa skill; individuals with access to an Amazon Echo can learn about certain first aid topics by voice for free. Users can say, for example, “Alexa, ask First Aid about how to treat a burn”, and Alexa will provide key diagnosis and treatment information. Alexa then provides recommendations on diagnosis and key actions. This tool is a few years old, and user feedback has indicated to us that the skill is valuable, but there are areas ripe for improvement.

What we did:

Ahead of the testing, the areas we wanted to learn about and improve were:

  • ‘Invocations’- This is Amazon speak for the variety of phrases people use to get their request across, e.g. ‘how do I treat a burn’, ‘what do I do if someone is burnt’, ‘how do I help someone with a burn’, etc. We wanted to know how people would naturally phrase their questions about first aid, and how well Alexa understood the range of phrasings.
  • How well Alexa’s answers improve the user’s confidence to positively react to the first aid scenario.

Knowing we needed to learn more about how users interacted with the tool and understand issues was one thing, figuring out how to do so was another. When you run usability tests on a website you are broadly testing two things; the design and the content. When you test for voice you only have the content to work with. So testing for voice is more simple? Not so fast… Yes there are fewer variables, but there is also a lot less to work with! With a tool that is purely voice you have fewer reference points and less to question and probe the participant on.

Since I knew that I wanted to see how participants would speak to Alexa naturally, I needed to find non-verbal prompts . After realising a traditional script was a non-starter, I realised I could use images to prompt users to speak to Alexa. I decided to have an image for each first aid scenario and show this to the participant, so that they could ask the first aid skill about the situation as they might in normal circumstances.

The prompt for participants to ask Alexa a version of; ‘Hey Alexa, ask first aid how to treat a burn’

I also asked participants how confident they would feel dealing with each first aid scenario before and after hearing the response from Alexa. I asked the participants what they thought of each answer, and observed them as they listened, looking out for when participant’s attention waned.

What we learnt about our skill:

  1. It’s really important that we allow for alternatives to how people phrase their requests. During the testing, Alexa often misunderstood requests, and users found the many errors irritating, demotivating, and it made them worry they were ‘doing something wrong’. This tool should inspire confidence and users should feel comfortable and at ease while using it
  2. When seeking information on how to provide first aid, people aren’t interested in having the medical condition described to them. Answers with lengthy descriptions were least liked, and user’s admitted their minds wandered. Unless description helps them recognise the condition to treat it, people don’t need it. They want the emphasis to be on what they should do. If they want more detailed information on the condition, they said they would likely look that up separately and, didn’t see voice as the most suitable medium for lengthy descriptions.

What we learnt about testing for voice:

  1. Many different voices will confuse it; one of the benefits of Alexa is that it learns from your voice and accent, and becomes more accurate. If the Alexa is owned by, say, our Digital team, it hasn’t had the chance to adjust to everyone’s voice, and it then ends up having trouble understanding different accents. There isn’t much that can be done about this, but it’s valuable to be aware of, so that if Alexa can’t understand them, it isn’t anything they’ve done.
  2. If you are using visual prompts, ensure your images are as clear as possible. I tried to do this, but some images did not quite reflect the first-aid scenario I aimed to capture. When it wasn’t clear I had to guide them to the right word, thus slightly undermining the whole process. For example, the image of a burn, above, prompted many users to ask Alexa about ‘hand burns’, which confused Alexa and threw back errors.
  3. Be patient and don’t worry if you get it wrong a few times. Testing for voice does feel a few paradigm shifts away from testing for something visually, so be willing to try a few approaches.
  4. And finally, remember how to get the skill to shut up: if it mishears you, it will begin telling you what she can offer. This is unhelpful and grating so a quick use of ‘Alexa stop!’ is effective.

--

--