Say That Again!
Exploring the Web Speech API
While developing my final project for Flatiron School, I was investigating different ways of transcribing speech to text and discovered some pretty cool APIs.
External API Options
The biggest problem that I encountered was that both of these services required that I upload or stream an audio file to them. My project required the user to be able to speak for an unlimited amount of time, and I began to run into problems dealing with audio files that were too large.
Web Speech API
The Web Speech API has two functions, it can recognize speech and return text and it can synthesize speech from text. It works by accessing your computer’s microphone and checking any speech it receives through a speech recognition service, in Chrome the Google Speech API is used.
The first thing we need to do is create a new instance of the
speechRecognition() object and assign it some properties, like what language to listen for, whether to listen for continuous speech and how many guesses we want the service to return to us.
The speech recognition service has three useful methods for controlling when it is and isn’t listening.
.start() starts the speech recognition service.
.abort() which stops the service from listening.
.stop() which stops the service from listening and attempts to return a
Now that we have created our instance and set our parameters, how do we interact with it? The speechRecognition object comes with a bunch of events that we can use to create the functionality we need.
For this example, we will add an event listener for the
result event, which will be triggered when the speech to text service returns a result. after adding our event listener, we can call
.start() on our object to begin listening.
When our event listener is triggered it will return a
SpeechRecognitionResultList which contains
SpeechRecognitionResult objects. These each have two keys:
transcript whose value is a string of the converted text and
confidence which is a decimal value that represents the speech recognition service’s prediction of accuracy.
This example is set up to log the speech that is heard by the computer, but there is much more functionality than that built in.
You can use
speechGrammarList to give the service a list of words to listen for and then create different behavior based on what is said. “Alexa!”
Browser support seems to be quite limited for the Web Speech API, with Google Chrome and Firefox having the best implementation.
It’s also probably good to note, that sometimes the service will stop listening on its own if it thinks the user has stopped talking. This was very problematic for me, because I needed the service to listen continuously until told to stop. To get around it I ended up using
setInterval() to restart the speech object a few seconds after it turns itself off.