Using SpeechRecognition and SpeechSynthesis from the Web Speech API

Alexandra Williams
Jul 24, 2017 · 2 min read

I don’t remember how I came across it, but a few weeks about I learned that the Javascript Web Speech API allows your browser to recognize speech and speak to you (mind blown). It works similarly to using Siri on an iPhone, and many other voice assistants. This API has two main segments, SpeechRecognition and SpeechSynthesis, that can both be easily integrated into any web application. However, it is currently only fully supported in the Google Chrome browser. It’s still in it’s experimental phase, so it’s not perfect. Though it picks up words fairly well, it still has some work to do as far as properly constructing sentences. It also doesn’t add punctuation, it just produces a continuous string of words. Nevertheless, it has a lot of potential and I’m excited to see what will be created from it in the future.

SpeechRecognition

SpeechRecognition allows you to convert speech to text through the browser. If you’re using Google Chrome, paste the following code into your console and try it out. Paste the code, press enter, and speak. Watch your speech print out in the console:

const SpeechRecognition = window.webkitSpeechRecognitionrecognizer = new SpeechRecognition()recognizer.interimResults = true
recognizer.continuous = true
recognizer.lang = 'en-US'
recognizer.onresult = (event) => {
let result = event.results[event.resultIndex]
if (result.isFinal) {
console.log('You said: ' + result[0].transcript)
}
}
recognizer.start()

Here’s how it works: .

  • We create a new instance of SpeechRecognition called recognizer (or whatever you want to call it).
  • SpeechRecognition has a few different properties. Here, we are using interimResults (boolean) which gives text results before they are considered final (meaning before the best word match that could be found).
  • We also use continuous (boolean) that is true if you want multiple text results instead of just one result at the end.
  • Recognizer.lang sets the language that the browser will understand and result in.
  • Then we have recognizer.onresult, which is constantly invoked every time the browsers recognizers speech. It takes the SpeechRecognitionEvent, which has a few different properties. The most useful is ‘result’, which provides the text and information about it. One of result’s properties is isFinal, which is true at best word match, otherwise false.
  • Finally, to start to it up, we just call recognizer.start(). (There is also a recognizer.stop(), but after a few idle seconds it stops automatically.)

And that’s it. Here’s a fun example of SpeechRecogniton in use:

SpeechSynthesis

SpeechSynthesis is much easier to get started. Just turn your volume up and paste this into your console and press enter:

speechSynthesis.speak(
new SpeechSynthesisUtterance('Welcome to my blog')
)

It allows you to change the voice, and has other similar events to SpeechRecognition, such as onstart, onresult, etc. Here’s an example of SpeechSynthesis in use:

Overall, there is still a lot of work left to be done on this API, but it opens a new door of opportunities for web applications.

Alexandra Williams

Written by

NYC | Software Engineer

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade