How I Made an App to Recognize Speech

Tutorial for Node.js app using Google Speech-to-Text API

Karen McClellan
Apr 26 · 5 min read
Critical Making (APRD 5019-001)
Project: Sous Chef

Step 1: Set up project on Google Cloud Platform

On Google Cloud Platform, set up a new project and enable the Google Speech-to-Text API for it. Then create a service account. (There’s a 12-month free trial for the account, and the Speech-to-Text API is free for up to 60 min of audio/month.)

Step 2: Set up Node.js app environment

In Terminal, create a new local directory for your test project. Run cd testApp then npm init. Follow the prompts to set up your package.json file.

Step 3: Install and initialize the Cloud SDK and API client library

In Terminal, set the environment variable to the path of the JSON file with the private key that you downloaded in step 1:

Step 4: Install SoX and node record package

Since we want to run streaming speech through the API, we need to install a couple of additional programs/packages.

Step 5: Set up your main.js file

Here’s the code I used from a tutorial on Google Cloud:

const record = require('node-record-lpcm16');// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');// Creates a client
const client = new speech.SpeechClient();const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';const request = {
  config: {
    encoding: encoding,
    sampleRateHertz: sampleRateHertz,
    languageCode: languageCode,
  },
  interimResults: false,
};// Create a recognize stream
const recognizeStream = client
  .streamingRecognize(request)
  .on('error', console.error) 
  .on('data', data =>
    process.stdout.write(
      data.results[0] && data.results[0].alternatives[0]
        ? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
        : `\n\nReached transcription time limit, press Ctrl+C\n`
    )
  );// Start recording and send the microphone input to the Speech API
record
  .start({
    sampleRateHertz: sampleRateHertz,
    threshold: 0,
    verbose: false,
    recordProgram: 'rec', // Try also "arecord" or "sox"
    silence: '10.0',
  })
  .on('error', console.error)
  .pipe(recognizeStream);console.log('Listening, press Ctrl+C to stop.');

Step 6: Run a test in Terminal

Run the app in terminal with node main.js. You should see 'Listening, press Ctrl+C to stop.' (from the last line of your main.js file). Speak into your microphone, then press Ctrl+C to stop. You should see the transcription of your speech pop up:

Next: How to grab the transcription text string

Next, you’ll want to be able to grab the transcription string so you can work with it. I did a little digging to figure out how the data objects are stored. To start, I amended the recognizeStream as follows:

// Create a recognize stream
const recognizeStream = client  .streamingRecognize(request)
  .on('error', console.error)   .on('data', data =>
    console.log(data)
  );
// Create a recognize stream
const recognizeStream = client  .streamingRecognize(request)
  .on('error', console.error)  .on('data', data =>
    console.log(data.results[0])
  );

KMakes

design theories and musings

Karen McClellan

Written by

Masters Student, UX/Product Design | CMCI Studio, CU Boulder

KMakes

KMakes

design theories and musings