Voice-enabling an Angular App with Wake Words

David Bartle
Picovoice
Published in
3 min readApr 22, 2021

The Web Speech API contains powerful functionality that allows you to transcribe speech in the Chrome browser. However, it is not a suitable choice for always-listening functionality. In this tutorial, we’re going to make a quick Angular app that uses a wake word to trigger the Speech API for a completely hands-free voice interaction loop.

Technology stack logos for the demo

The Web Speech API, by its nature, is not always-listening (nor would that be practical: when it is active, it sends a continuous stream of microphone data to Google). To trigger this API with voice, we need a separate wake word detector (also known as a trigger word, hotword, or wake-up word). We can use Porcupine to listen continuously for “Okay Google” to start the transcription.

The Web Speech API unfortunately only works on the Chrome browser (not even Chromium or Electron, sadly). Porcupine, however, works across all modern browsers. The wake word audio processing occurs inside the browser, providing intrinsic privacy and reliability for that part of the application; microphone data is only being sent out when you’re actively using Google’s Cloud-powered speech processing.

Install the packages

Set up a new Angular project and install the following packages using yarn or npm (this example uses English, but other languages are also available):

yarn add @picovoice/porcupine-angular @picovoice/web-voice-processor

Briefly summarized:

  • web-voice-processor accesses the microphone (incl. asking user permission) and continuously converts the stream of audio data into the de facto speech recognition format.
  • porcupine-angular provides the Angular PorcupineService, which uses both web-voice-processor and Porcupine and abstracts some of the details.

Start the wake word engine

We have three main tasks:

  1. Initialize PorcupineService to listen for a built-in wake word: “Okay Google”.
  2. Subscribe to the keywordDetection$ event on PorcupineService and take appropriate action when we detect “Okay Google”.
  3. Start SpeechRecognition and record the results of the transcription into a variable that can be displayed on the page.

Here’s a project on GitHub with the completed result. Run the following to start the demo:

  1. git clone https://github.com/Picovoice/porcupine.git
  2. cd porcupine/demo/angular-stt
  3. yarn
  4. yarn start

This will start a server on http://localhost:4200. Upon loading, it will prompt for microphone permission. You will need to enter your AccessKey which can be obtained from Picovoice Console.

Google’s Speech API outperforming monkeys with typewriters

Our Angular app will now be able to start transcription when it hears “Okay Google.” We can see the results on the screen in the text box. The SpeechRecognition will halt after some period of silence, at which point we’ll resume listening for the wake word. We check for the presence of webkitSpeechRecognition, in case the user is running the application outside of Chrome (the “Okay Google” wake word will function regardless).

Try changing the wake word in the code. Where it says “Okay Google”, substitute the name of another free built-in English wake word such as “Alexa”, “Bumblebee”, or “Porcupine”.

Using transcription for voice commands?

Sometimes, we aren’t looking for a transcription per se, but rather wish to detect structured voice commands. For example:

“Set a timer for 10 minutes”

This utterance could signal a setTimer intent with a value of 10 minutes for duration. It’s tempting to record a transcription and then use regular expressions on the results. It is even intuitive to connect these two things together like Lego® blocks (or Unix pipes)— but there is a better way.

For complex voice commands, you can use the Rhino Speech-to-Intent engine, instead. This approach skips intermediate text representation and builds a custom acoustic model. Instead of attempting to transcribe any possible spoken phrase in a language and then regex the result, you can build a bespoke context that focuses on the task at hand (say, a stopwatch) and directly returns a JavaScript object with all of the data of interest. The resulting accuracy and efficiency means you can get better results and — like Porcupine — can run privately and across all modern browsers.

--

--