“Computer! Tea, Earl Grey, Hot”: Offline Voice on NodeJS

David Bartle
Picovoice
Published in
2 min readFeb 11, 2021

--

Articles on speech recognition have no shortage of Star Trek references. Indeed, in 2017 Amazon added the famous “Computer” wake word to Echo devices as an alias for “Alexa”, in a nod to the legendary television and film series. In 2021, it’s now possible to recreate this experience on commodity hardware that processes voice privately and entirely offline. Let’s recreate the replicator, where Captain Picard orders his usual beverage, in NodeJS.

Private Voice AI understanding the captain’s beverage order, in NodeJS

The first step is the “Computer” wake word, or hotword: always-listening voice commands that serve to trigger a device to do something, including listening for subsequent (and typically more complex) naturally-spoken phrases. We’ll need to complete the following steps to get this up and running:

  1. Get a Picovoice AccessKey
  2. Create a new NodeJS project and add dependencies
  3. Run the script to listen for “Computer” and output detection events

1. Get a Picovoice Access Key

Signup or Login to Picovoice Console to get your AccessKey. We will need this AccessKey to initialize Picovoice’s SDK. Keep your AccessKey with you as we will need it in the next steps.

2. Create a new NodeJS project using npm (or yarn):

Create a new folder called “replicator”, initialize a new npm project, and install the dependencies:

  • Porcupine wake word engine
  • PvRecorder, which is Picovoice’s cross-platform recorder
mkdir replicator && cd replicator
npm init -y
npm install @picovoice/porcupine-node @picovoice/pvrecorder-node
touch index.js

Open index.js and paste in the following script:

NodeJS script with Porcupine that detects the “Computer” hotword.

Replace ${ACCESS_KEY} with your AccessKey.

3. Running the demo

Run the script, and say “Computer”:

$ node index
Listening for 'COMPUTER'...
Press ctrl+c to exit.
Detected 'COMPUTER'

Voilà! Our NodeJS script is recognizing the “Computer” wake word.

Understanding the code

The essence of the code is:

  1. Continuously receive microphone “data” via PvRecorder. PvRecorder returns linear 16kHz PCM audio, the de facto industry standard format for speech processing libraries.
  2. Pass it to Porcupine and receive essentially a yes/no response for “Computer”.

We set up an instance of the Porcupine engine that will listen for “Computer” at a sensitivity of 0.5. Sensitivity is a parameter between 0 and 1 that trades false alarms for false detections. You can increase or decrease it based on your particular scenario.

const { 
Porcupine,
BuiltinKeyword
} = require("@picovoice/porcupine-node");
const PvRecorder = require("@picovoice/pvrecorder-node");
const porcupine = new Porcupine(
"${ACCESS_KEY}",
[BuiltinKeyword.COMPUTER],
[0.5]
);
const recorder = new PvRecorder(-1, porcupine.frameLength);

Note that arguments for keywords and sensitivities are arrays. This is because Porcupine supports listening to multiple wake words simultaneously.

Now that we have the wake word, the next step is to understand the follow-on command: “tea, earl grey, hot”. Continued in Part II.

--

--