“Alexa, I’m Sorry” Case Against Wake Words

Razeeb Mahmood
An Attempt at Writing
2 min readJun 29, 2018

Currently smart assistants from Amazon, Google and Apple require a wake word, a trigger, followed by a command/request/question before there is a response. Devices that host these smart assistants are constantly listening for a wake word to start working essentially, with great deal of accuracy.

Now what if your voice was the trigger itself? That should make the process much simpler.

When I say a wake word and ask a question to Alexa, my voice I have noticed changes compared to how I speak normally with others, observed the same from others as well — we code-switch. There are specific changes to my pitch, cadence and tone. These are the same voice characteristics that are used to create speaker models and map voiceprints for voice verifications. Many smart assistants can detect different users using these voice characteristics.

So instead of saying “Alexa, turn off lights” I can say just say “turn off lights”. The smart assistant should be able to detect from my voice (without a wake word) that my words are directed toward it and not someone else.

There are however many problems that will surface from something like this.

  1. False triggers, lots of it. Why smart assistants need to be properly configured to a user’s voice and speaking style before use. Also it has to know what is being said before and after a command and be smart about it to filter out possible false triggers.
  2. Best suited for simple commonly used commands/requests/questions that are few words long. Like “turn off lights”, “what’s the weather outside?”. Most commands are simple.
  3. Bigger privacy concern. Devices will hear people a lot more than they do right now. Needing to make sure smart assistants only record all successful triggers for short period of time, not continuously everything.

Voice as trigger just makes the wake word unnecessary, for the most part anyway. I believe we will continue to ask these smart assistants questions that require us to get their attention first very specifically. Just as before we ask a question to a friend we say their name first in a crowd to get their attention. But if the friend is the only person in the room, no need to call their name, for most things. Like “buy me an Echo”.

--

--