How to Control Your React App With Your Voice

Build voice-activated interfaces using the Chrome Speech Recognition API

Juan Cruz Martinez
Nov 25, 2020 · 6 min read
a microphone
a microphone
Photo by Gift Habeshaw on Unsplash

A few months ago, I wrote an article on web speech recognition using TensorflowJS. Even though it was super interesting to implement, it was super cumbersome for many of you to extend. The reason was pretty simple: It required a deep learning model to be trained if you wanted to detect more words than the model I provided, which was pretty basic.

For those of you who needed a more practical approach, that article wasn’t enough. Following your requests, I’m writing today about how you can bring full speech recognition to your web applications using the Web Speech API.

But before we address the actual implementation, let’s understand some scenarios where this functionality may be helpful:

  • Building an application for situations where it is not possible to use a keyboard or touch devices. For example, people working in the field or with special gloves may find interactions with input devices hard.
  • To support people with disabilities.
  • Because it’s awesome!

What’s the Secret to Powering Web Apps With Speech Recognition?

Drums… the secret is Chrome (or Chromium) Web Speech API. This API, which works with Chromium-based browsers, is fantastic and does all the heavy work for us, leaving us to only care about building better interfaces using voice.

However incredible this API is, as of November 2020, it is not widely supported, and that can be an issue depending on your requirements. Here is the current support status, courtesy of can I use? Additionally, it only works while online, so you will need a different setup if you are offline.

Naturally, this API is available through JavaScript, and it is not unique or restricted to React. Nonetheless, there’s a great React library that simplifies the API even more, and it’s what we are going to use today.

Feel free to read the documentation of the Speech Recognition API on MDN docs if you want to do your implementation on vanilla JS or any other framework.

Hello World, I’m Transcribing

We will start with the basics. We will build a Hello World app that will transcribe in real time what the user is saying. Before doing all the good stuff, we need a good working base, so let’s start by setting up our project. For simplicity, we will use create-react-app to set up our project.

npx create-react-app voice-command

Next, install the react-speech-recognition library with:

npm i react-speech-recognition

Next, we will work on the file App.js. CRA (create-react-app) creates a good starting point for us. Just kidding, we won’t need any of it, so start with a blank App.js file and code with me.

Before we can do anything, we need the imports:

import { useEffect } from 'react';
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition';

Next, we will create a simple function component with a few buttons to interact with the speech engine.

Pretty easy, right? Let’s see in detail what we are doing, starting with the useSpeechRecognition Hook.

This Hook is responsible for capturing the results of the speech recognition process. It’s our gateway to producing the desired results. In it’s simplest form, we can extract the transcript of what the user is saying when the microphone is enabled, as we do here:

const { transcript } = useSpeechRecognition();

Even when we activate the Hook, we don’t start immediately listening; for that, we need to interact with the object SpeechRecognition that we imported at the beginning. This object exposes a series of methods that will help us control the speech recognition API, methods to start listening on the microphone, stop, change languages, etc.

Our interface simply exposes two buttons for controlling the microphone status; if you copied the provided code, your interface should look and behave like this:

screen that shows the messages “Hello World” and “start listening for transcript” with buttons to start, stop, and reset
screen that shows the messages “Hello World” and “start listening for transcript” with buttons to start, stop, and reset
Try the live demo at Live Code Stream

If you tried the demo application, you might have noticed that you had missing words if you perhaps paused after listening. This is because the library by default sets this behavior, but you can change it by setting the parameter continuous on the startListening method, like this:

SpeechRecognition.startListening({ continuous: true });

Compatibility Detection

Our app is nice! But what happens if your browser is not supported? Can we have a fallback behavior for those scenarios? Yes, we can. If you need to change your app’s behavior based on whether the speech recognition API is supported or not, react-speech-recognition has a method for exactly this purpose. Here is an example:

useEffect(() => {
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
alert("Ups, your browser is not supported!");
}
}, []);

Detecting Commands

So far, we covered how to convert voice into text, but now we will take it one step further by recognizing predefined commands in our app. Building this functionality will make it possible for us to build apps that can fully function by voice.

If we need to build a command parser, it could be a lot of work, but thankfully, the speech recognition API already has a built-in command recognition functionality.

As described in the react-speech-recognition documentation:

“To respond when the user says a particular phrase, you can pass in a list of commands to the useSpeechRecognition hook. Each command is an object with the following properties:

command: This is a string or RegExp representing the phrase you want to listen for.

callback: The function that is executed when the command is spoken. The last argument that this function receives will always be an object containing the following properties:

resetTranscript: A function that sets the transcript to an empty string

matchInterim: Boolean that determines whether “interim” results should be matched against the command. This will make your component respond faster to commands, but also makes false positives more likely - i.e. the command may be detected when it is not spoken. This is false by default and should only be set for simple commands.

isFuzzyMatch: Boolean that determines whether the comparison between speech and command is based on similarity rather than an exact match. Fuzzy matching is useful for commands that are easy to mispronounce or be misinterpreted by the Speech Recognition engine (e.g. names of places, sports teams, restaurant menu items). It is intended for commands that are string literals without special characters. If command is a string with special characters or a RegExp, it will be converted to a string without special characters when fuzzy matching. The similarity that is needed to match the command can be configured with fuzzyMatchingThreshold. isFuzzyMatch is false by default.

fuzzyMatchingThreshold: If the similarity of speech to command is higher than this value when isFuzzyMatch is turned on, the callback will be invoked. You should set this only if isFuzzyMatch is true. It takes values between 0 (will match anything) and 1 (needs an exact match). The default value is 0.8.”

Here is an example of how to predefine commands for your application:

You can also have dynamic commands like the following:

Conclusion

Thanks to Chrome’s speech recognition APIs, building voice-activated apps couldn’t be easier and more fun. Hopefully, in the near future, we’ll see this API supported by more browsers and with offline capabilities. Then it will become a very powerful API that may change the way we build the web.

Thanks for reading!

Better Programming

Advice for programmers.

Sign up for The Best of Better Programming

By Better Programming

A weekly newsletter sent every Friday with the best articles we published that week. Code tutorials, advice, career opportunities, and more! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Thanks to Zack Shapiro

Juan Cruz Martinez

Written by

I’m an entrepreneur, developer, author, speaker, and doer of things. I write about JavaScript, Python, AI, and programming in general.

Better Programming

Advice for programmers.

Juan Cruz Martinez

Written by

I’m an entrepreneur, developer, author, speaker, and doer of things. I write about JavaScript, Python, AI, and programming in general.

Better Programming

Advice for programmers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store