A basic tutorial on how to set up Speech Recognition with React

Amanda Hussey
6 min readJan 30, 2018

--

I recently created a dream journal app that uses the voice-to-text Speech Recognition feature of JavaScript’s Web Speech API to record and save a user’s dreams (instead of requiring the user to type out the full dreams themselves). The voice-to-text technology is surprisingly accurate. Some downsides though…it’s only fully supported by Chrome at this time, and it only listens for so long (up to around five minutes or so) until it simply loses interest (and stops listening). For the purposes of many apps, five minutes is more than enough, so it’s worth checking out!

I had a lot of fun building this app, and I wanted to share what I did to incorporate this speech recognition technology. Specifically, I’d like to share how I was able to wrap the functionality into a React component. By the end of this tutorial, you will be able to

  • start/stop speech recognition (voice-to-text) on the click of a button, and
  • stop speech recognition using voice commands.

Below is an example of the tutorial’s end product. The blue button starts and stops speech recognition, and the interim/final transcripts appear in gray/black, respectively.

“how’s it going” is still in interim processing. “testing testing” is final.
“testing testing how’s it going” is the full final transcript.

Let’s jump in!

Set up a new instance of SpeechRecognition.

I don’t want to spend too much time discussing the initial setup of the SpeechRecognition instance, since that can be found in the docs: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition

However, I want to note that we set recognition.interimResults = true, since it is set to false by default. By changing it to true, we can see the interim results as well as the final for the purposes of this tutorial.

A note about the difference between interim and final transcripts:

The interim transcripts are simply the words that your speech recognition sifts through as it attempts to find the best match. For example, if you said “waffles”, then your speech recognition’s interim processing might first hear “awful”. Then, a second or so later, it might correct itself to “waffles” as it finds the better match. My point is: If you require accuracy, use final transcripts, not interim.

We’re incorporating the interim transcripts in this tutorial simply to demonstrate how speech recognition works. If you’re not interested in the interim data, then you can remove line 5 from the above gist when setting up your instance of speech recognition.

Now we can dive into our first goal!

Start/stop speech recognition on the click of a button

Before we write any code, we should outline our approach.

We need to first create a button element. Then, to program the functionality of the button, we need to write an onClick event handler, which will take care of each click as follows:

  • When we click our button for the first time, we want our instance of speech recognition to begin listening.
  • When we click for the second time, it should stop listening.
  • As we continue clicking, this start/stop cycle should repeat.

Looking at the above bullets, it becomes clear that we need to keep track of this listening “state” (hint, hint) somehow; that is, we need to keep track of when speech recognition needs to start/stop. How can we do this? Ah yes, we can use our React component’s local state, as shown below. By default, our component will not be listening, so the initial state of listening will befalse.

this.state = { listening: false }

We now need a way to turn speech recognition on and off. That is, we need a way to toggle our state of listening between true and false. For that, we can write the simple method, toggleListen, as shown below.

toggleListen(){
this.setState = ({
listening: !this.state.listening
})
}

Now we can now write our onClick handler. This is the flow we want:

→ Click button

→ Toggle listening (i.e., invoketoggleListen)

→ Start/Stop speech recognition if this.state.listening = true/false, respectively

[ → Do anything else dependent on state, e.g., change color of button while this.state.listening = true ]

We will create a separate method to handle all of the speech recognition logic, called handleListen. At first, it makes sense to first define our onClick handler as so:

onClick={() => {
this.toggleListen()
this.handleListen()
}}

However, if you set up the onClick handler in this way, you will very quickly realize that it will not always start when you click! You might have to click several times before it starts listening. Why is this? Well, React’s setState method is not guaranteed to be synchronous. In the background, React decides when it is best to change the state. Sometimes it is immediate, sometimes it is not. Hence our problem.

To solve this issue, we will invoke our handleListen method in the callback of setState, as shown below.

toggleListen() {
this.setState({
listening: !this.state.listening
}, this.handleListen)
}

Now, we can simply set our onClick handler to equal this.toggleListen. Our desired flow (click → toggle listen → listen) is now guaranteed!

onClick={this.toggleListen}

The rest of this tutorial is dedicated to fleshing out our handleListen method. Here is a gist of what we have so far (including some CSS). Don’t forget to bind those methods!

The handleListen method:

We start off handleListen with the below code, which tells our speech recognition to start listening when this.state.listening = true.

handleListen() {
if (this.state.listening) recognition.start()
}

To collect the interim and final transcripts, we use speech recogntion’s built-in event handler called onresult , as shown in the gist below. The code in the for loop specifically comes from the docs.

At this point, if you click on the button, you should be able to see the interim and final transcripts populating those divs as you speak!

If you play around with it for a bit, you’ll notice that the speech recognition will actually end on its own during any decent pause in speech. This is not what we want. What if the user needs a few second to think?

We can trick speech recognition into “continuous” listening by playing with other built-in event listeners. Specifically, we can call recognition.start again within recognition.onend to restart listening if it decides to end on its own.

Finally, to stop speech recognition, we simply add the else statement, which calls recognition.end when this.state.listening = false.

With the above code, if this.state.listening = true, but speech recognition decides to stop listening, it is manipulated into listening once again (muahaha!). Try it out! The darn thing will keep listening until you click that button again… for the most part. Unfortunately, it will eventually time out after about 5 minutes or so. If you truly need longer than 5 minutes, you may be able to get around this issue by playing with the event listeners and adding other controlled data to local state.

Stop speech recognition using voice commands

What if you want to stop speech recognition using a voice command instead of a click? Let’s say you want it to stop listening after saying the words “stop” and “listening” one after another. You simply need to split the final transcript into an array of words, and if the last two words of that array are “stop” && “listening”, then call recognition.stop. You can then remove the words “stop” and “listening” from the array to produce a final text that does not contain the phrase “stop listening”.

Overall, it’s simply a game of array manipulation once you have the final transcript. See the “ — COMMANDS — ” section in the final gist below for more details on this specific voice command example.

Note on the final gist: I added the below console.log statements to help keep track of the speech recognition activity.

  • “Listening!” will be logged when you click the button and it starts listening.
  • “…continue listening…” will be logged when speech recognition is manipulated into restarting after stopping on its own.
  • “Stopped per click” will be logged when you stop speech recognition by using a click.
  • “Stopped per command” will be logged when you stop speech recognition by using the voice command.

That’s about it for this tutorial! There’s so much more you can do with this combo (SpeechRecognition + React), such as toggling the color of your button, or rendering some other component while listening.

Whatever you do, have fun with it!

--

--