How Many Voice Assistants Do We Really Need?

Todd Mozer's Desk
Sensory Perspectives on AI
2 min readJun 18, 2024
Virtual assistant concept image generated by DALL·E (OpenAI) edited by A.Adeboje

That’s easy…Just one. One that knows all the current knowledge of the internet, doesn’t hallucinate, knows everything about me, can carry out actions on my behalf, answers to my voice with my wakeword, and runs on device so there’s no risk in my private data leaking out. I think it’s kind of what Apple Siri is striving for with on device LLM’s and Apple Intelligence, but they aren’t there yet.

Nobody is there yet. In the next 5 or so years we will likely need multiple assistants. The assistants can have different domains of expertise and by staying focused each one can be smaller and more intelligent within its domain. By being smaller it can run on device, so the most private voice assistants will have these more targeted capabilities. For example, a medical assistant can know all my personal health information and history and is a health expert, so it can intelligently keep up on all the latest studies and provide great targeted advice. I could have a separate car assistant that runs without internet and can route to “Grandpa’s house” and knows that when I look for “nearby restaurants” I’m looking for good-tasting vegan food. It knows me!

Each assistant can have its own name, in fact, Sensory can run multiple wake words in parallel. We can listen for Google, Alexa, Cortana, Siri or others all at the same time. We can personalize these by making them only respond to the right user through the power of speaker verification. Sensory calls this an Enrolled Fixed Trigger (wake word).

Most importantly, the technology to name your own assistant is here today! Sensory calls it “User-Defined Triggers”. You say your wakeup phrase a few times and the Sensory tech stack learns how you say it and enables a tiny low-power recognition of your voice on many dozens of different chips and platforms and in any language, all in an ultra-tiny low heat, low MIPs package!

In the short run, we will probably be calling out to a few different Voice Assistants, maybe we’ll soon have a master assistant that you get to name and that performs cognitive arbitration to figure out which domain specific models it needs engage to answer questions as accurately as possible with the privacy levels and connectivity you want!

--

--