Published in


Voice-enabling an Angular App with Wake Words

The Web Speech API contains powerful functionality that allows you to transcribe speech in the Chrome browser. However, it is not a suitable choice for always-listening functionality. In this tutorial, we’re going to make a quick Angular app that uses a wake word to trigger the Speech API for a completely hands-free voice interaction loop.

Technology stack logos for the demo

The Web Speech API, by its nature, is not always-listening (nor would that be practical: when it is active, it sends a continuous stream of microphone data to Google). To trigger this API with voice, we need a separate wake word detector (also known as a trigger word, hotword, or wake-up word). We can use Porcupine to listen continuously for “Okay Google” to start the transcription.

The Web Speech API unfortunately only works on the Chrome browser (not even Chromium or Electron, sadly). Porcupine, however, works across all modern browsers. The wake word audio processing occurs inside the browser, providing intrinsic privacy and reliability for that part of the application; microphone data is only being sent out when you’re actively using Google’s Cloud-powered speech processing.

Install the packages

Set up a new Angular project and install the following packages using yarn or npm (this example uses English, but German, French, and Spanish are also available):

yarn add @picovoice/porcupine-web-angular @picovoice/porcupine-web-en-worker @picovoice/web-voice-processor

Briefly summarized:

  • web-voice-processor accesses the microphone (incl. asking user permission) and continuously converts the stream of audio data into the de facto speech recognition format.
  • porcupine-web-en-worker provides an all-in-one Web Worker with the entire Porcupine Voice AI, in WebAssembly, embedded inside it (English-language model). This allows running Porcupine in the browser, off of the main JavaScript thread.
  • porcupine-web-angular provides the Angular PorcupineService that glues the above two packages together and abstracts some of the details.

Start the wake word engine

We have three main tasks:

  1. Initialize PorcupineService to listen for a built-in wake word: “Okay Google”.
  2. Subscribe to the keyword$ event on PorcupineService and take appropriate action when we detect “Okay Google”.
  3. Start SpeechRecognition and record the results of the transcription into a variable that can be displayed on the page.

Here’s a project on CodeSandbox with the completed result. It will prompt for microphone permission upon loading (unfortunately, we can’t directly embed it into this article since that appears to not prompt correctly for microphone access).

Google’s Speech API outperforming monkeys with typewriters

Our Angular app will now be able to start transcription when it hears “Okay Google.” We can see the results on the screen in the text box. The SpeechRecognition will halt after some period of silence, at which point we’ll resume listening for the wake word. We check for the presence of webkitSpeechRecognition, in case the user is running the application outside of Chrome (the “Okay Google” wake word will function regardless).

Try changing the wake word in the code. Where it says “Okay Google”, substitute the name of another free built-in English wake word such as “Alexa”, “Bumblebee”, or “Porcupine”.

Using transcription for voice commands?

Sometimes, we aren’t looking for a transcription per se, but rather wish to detect structured voice commands. For example:

“Set a timer for 10 minutes”

This utterance could signal a setTimer intent with a value of 10 minutes for duration. It’s tempting to record a transcription and then use regular expressions on the results. It is even intuitive to connect these two things together like Lego® blocks (or Unix pipes)— but there is a better way.

For complex voice commands, you can use the Rhino Speech-to-Intent engine, instead. This approach skips intermediate text representation and builds a custom acoustic model. Instead of attempting to transcribe any possible spoken phrase in a language and then regex the result, you can build a bespoke context that focuses on the task at hand (say, a stopwatch) and directly returns a JavaScript object with all of the data of interest. The resulting accuracy and efficiency means you can get better results and — like Porcupine — can run privately and across all modern browsers.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store