The Web Speech API contains powerful functionality that allows you to transcribe speech in the Chrome browser. However, it is not a suitable choice for always-listening functionality. In this tutorial, we’re going to make a quick Angular app that uses a wake word to trigger the Speech API for a completely hands-free voice interaction loop.
The Web Speech API, by its nature, is not always-listening (nor would that be practical: when it is active, it sends a continuous stream of microphone data to Google). To trigger this API with voice, we need a separate wake word detector (also known as a trigger word, hotword, or wake-up word). We can use Porcupine to listen continuously for “Okay Google” to start the transcription.
The Web Speech API unfortunately only works on the Chrome browser (not even Chromium or Electron, sadly). Porcupine, however, works across all modern browsers. The wake word audio processing occurs inside the browser, providing intrinsic privacy and reliability for that part of the application; microphone data is only being sent out when you’re actively using Google’s Cloud-powered speech processing.
Install the packages
Set up a new Angular project and install the following packages using
npm (this example uses English, but German, French, and Spanish are also available):
yarn add @picovoice/porcupine-web-angular @picovoice/porcupine-web-en-worker @picovoice/web-voice-processor
- web-voice-processor accesses the microphone (incl. asking user permission) and continuously converts the stream of audio data into the de facto speech recognition format.
- porcupine-web-angular provides the Angular
PorcupineServicethat glues the above two packages together and abstracts some of the details.
Start the wake word engine
We have three main tasks:
PorcupineServiceto listen for a built-in wake word: “Okay Google”.
- Subscribe to the
PorcupineServiceand take appropriate action when we detect “Okay Google”.
SpeechRecognitionand record the results of the transcription into a variable that can be displayed on the page.
Here’s a project on CodeSandbox with the completed result. It will prompt for microphone permission upon loading (unfortunately, we can’t directly embed it into this article since that appears to not prompt correctly for microphone access).
Our Angular app will now be able to start transcription when it hears “Okay Google.” We can see the results on the screen in the text box. The SpeechRecognition will halt after some period of silence, at which point we’ll resume listening for the wake word. We check for the presence of
webkitSpeechRecognition, in case the user is running the application outside of Chrome (the “Okay Google” wake word will function regardless).
Try changing the wake word in the code. Where it says “Okay Google”, substitute the name of another free built-in English wake word such as “Alexa”, “Bumblebee”, or “Porcupine”.
Using transcription for voice commands?
Sometimes, we aren’t looking for a transcription per se, but rather wish to detect structured voice commands. For example:
“Set a timer for 10 minutes”
This utterance could signal a
setTimer intent with a value of
10 minutes for duration. It’s tempting to record a transcription and then use regular expressions on the results. It is even intuitive to connect these two things together like Lego® blocks (or Unix pipes)— but there is a better way.