Three things that could make Alexa interactions more natural

Here are 3 ideas that could help make interactions with Alexa feel more natural.

  1. Postpone alarms
  2. Voice formating
  3. Pre-timer announcement

Alexa, postpone the alarm by 30 minutes

Since I brought a Echo home, other than playing music, timer and alarms has been the most common use at my place.

One morning the interaction with the alarms didn’t go very smoothly.

Alexa, wake up Annie at 7am.

Annie looks at me funny…. right you are not working today, are you?

Alexa, postpone alarm by 30 minutes

Alexa does not reply, just plays the termination tone. Time to think of another voice command to recover.

Alexa, cancel Alarm
Alexa, set alarm for 7:30pm

Annie looks at me funny again….. I said pm didn’t I? Now time to recover from that other user error:

Alexa, cancel alarm
Alexa, wake me up at 7:30am

The first time I tried postponing an alarm, Alexa had set another timer instead. That was even harder to recover from. It shows that while natural language interfaces are a good time saver, when one request goes bad, recovery can be a pain, and the user experience collapses.

Allowing to postpone an alarm would make the timer and alarm management easier.

Voice formating: Make it less exact to make it more natural

As I’ve mentioned before, voice specific considerations are needed when controlling how some things are said by Alexa. For example, by controlling pauses when saying a phone number, or pronunciation of special words a brand names.

Another case is saying numbers, for example asking Alexa for 250/3, gets this answer

250 divided by 3 is 83.3333333333

Spoken as:

Two hundred and fifty divided by three is eighty three dot three three three three three three three three three three

I doubt that anyone would say a number like that, perhaps something like this would be better

eighty three dot three repeating

Alternatively, saying only one or two decimals would have been plenty.

As another example, when asking the time to Alexa, you also often get these over precise answers:

The time is eleven eleven am.

Perhaps “ten past eleven” would have done just fine, I usually know whether it is am or pm… While you do expect the exact time from the display of a clock, when interacting with a voice assistant I think a human way to express the answer is better than precision.

Alexa, it is bed time in 15 minutes

As a parent, I quickly found that getting a kid to do something is way easier with advance notice and reminders. Asking a kid to do something immediately, always turns out to take longer and require more effort, than giving her a 15 minute notice, with a couple reminders along the way.

One way the Alexa timers could be better is to perform those reminders. (Obviously to enable that, voice reminders would be needed, which is another enhancement I’ve mentioned before.) The interaction could be triggered by sentences as “it is bed time in 15 minutes”, same could apply to shower, going to school, doing homework, …. with slightly different wordings.

Then Alexa could select two times to remind along the ways, perhaps 5 and 2 minutes before. Again precision is not important in this case, saying “it’s bed time in 5 minutes” while there is actually 7 minutes left does not really matter to the kid, nor affects the ease to get her to bed, it is just a matter of showing process towards the deadline. We clear expectations, there is always less resistance.

To further impact the interactions, in such scenarios that are not yet supported, Alexa is inconsistent when it does not answer a question, it either:

  • Does not answer at all
  • Answer “I was not able to answer the question I heard
  • Or does something unexpected; as setting a 2nd timer, instead of postponing the first one.

As to be expected, the amount of such interactions is so huge it will take a while for Amazon to get to cover them all. It would have been nice if the Alexa APIs had allowed 3rd parties to handle such request that Alexa does not know what to do with.

As the language model of Alexa expends, the interactions will be more and more natural. Along with that, keeping in mind that less precision can be more useful will also help. With those and more capabilities, richer experience will be possible, which will bring us closer to seamless ambient computing.