EfficientWord-Net: An Open Source hotword detector

Aman Rangapur
ANT BRaiN
Published in
4 min readNov 10, 2021

A hotword detection engine is a tiny algorithm that monitors a stream of audio for a special hotword and activates your voice assistant upon detecting it. Everytime you say “Alexa”, “Hey Siri”, or “Ok Google” you are using it. The rest of this document talks about it and why we thought creating a new one in 2021 is a good idea.

The below video shows the EfficientWord-Net in Raspberry Pi-4.

Check our GitHub repository to see installation instructions, play with it & read our preprint.

Things you should know about hotword engines

Waveform of a speech.

1- It must run on the edge (not cloud) for multiple reasons:

  • Privacy.
  • Power efficiency. In order to always run it must be extremely power efficient on mobile/wearable devices.
  • Convenience. The complexity involved in creating a new hotword.

2- Big companies (think Amazon, Google, Microsoft) have teams of scientists and engineers to build their own wake-word engine.

3- There are a handful of competent wake-word engine vendors in the market including Sensory, KITT.AI (acquired by Baidu), Snips , Porcupine and a freely available engine called PocketSphinx (CMU).

Why another hotword detection engine?

Integrating a wake-word engine is expensive, involves upfront cost, and takes time. Although affordable for big companies this can be a show-stopper for startups who want to join the voice-enabled revolution. Furthermore, it can implicitly discourage customization and personalization for even bigger players.

This is due to how wake-word engines are trained today. In order to train an engine for “Hey Robot” the vendor needs to collect hundreds of people saying “Hey Robot” and train a model just for that. The model does one thing really well. It can detect “Hey Robot”. A lot of money and time is spent on data collection and custom model training.

There are a few good hotword detection engines which does a fantabulous job with very few training samples such as Porcupine, but those are closed source. Engines like PocketSphinx require no training samples to detect a hotword, but its error rate is too high for practical applications.

Growing up in a software and hardware industry fuelled by open source, we decided to give something back to the open source community by developing EfficientWord-Net.

Introducing EfficientWord-Net

EfficientWord-Net is the wake-word engine we built in ANT-BRaiN.

  • It is a hotword detection engine based on One-Shot Learning. Hence it requires very few (4 - 6) training samples to create a new hotword.
  • It is highly accurate (more on this below).
  • It is cross-platform (Raspberry Pi, Linux, Mac and Windows [It can be run in smartphones, but software isn’t there yet]) and lightweight (%7 CPU usage on Raspberry Pi 3).
  • It is scalable in the sense that it can detect multiple hotwords concurrently without additional CPU/memory footprint.
  • It is open-sourced on GitHub. You can create new models for Linux/Mac and use a handful of existing models on all supported platforms.

We decided to solve the problem of training new hotwords with one-shot learning as the problem of hotword detection can be treated to be similar to face recognition where in the total number of classification classes is never limited. Moreover, using One-Shot Learning will allow newer accents to be covered with relatively few training samples.

How much better is EfficientWord-Net?

Here we are going to compare EfficientWord-Net with:

  • Snowboy (KITT.AI-Baidu): It is part of Alexa Voice Service SDK.
  • Pocketsphinx (CMU): Open source and freely available.
  • Porcupine: Closed Source and robust hotword detector.

There are four different hotwords for this test (Alexa, Computer, Restaurant, People). These hotwords are used to benchmark the perfomance of the above mentioned hotword detectors.

Results:

Our hotword detector’s performance is notably low when compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine.

This has been our undergrad project. Hence your support and encouragement will motivate us to develop the engine. If you loved this project recommend this to your peers, give us a 🌟 in Github and a clap 👏 in medium.

Authors:

The Serious Programmer : } and Aman Rangapur

--

--