Spider
Published in

Spider

Introduction to Speech Recognition with Web Speech API

Speech to text

Have you ever wondered how digital assistants like Alexa, Cortana, Google Assistant, and Siri interpret what we’re saying to respond to our question or command? Do you want to create an application that listens to your commands?

Well, in this article, we’ll get to the basics of Speech Recognition and create a simple Speech Recognition App. Let’s get started with…

What is Speech Recognition?

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT).

Speech recognition is a capability that enables a program to process human speech into a written format. It involves receiving speech through a device’s microphone, which is then checked by a speech recognition service against a list of grammar. When a word or phrase is successfully recognized, it is returned as a text string, and further actions can be initiated as a result.

How does it work?

In simple words, the speech recognition software works by breaking down the audio of a speech recording into individual sounds, analyzing each sound, using algorithms to find the most probable word fit in that language, and transcribing those sounds into text.

There are various algorithms and computation techniques like NLP(Natural language Processing), Neural Network, etc., to recognize speech into text and improve the accuracy of the transcription.

Fact: Speech Recognition and Voice Recognition are not the same though we often use them interchangeably. Voice recognition is recognizing the voice of the speaker, while speech recognition is recognizing the words said.

Let’s try to build our own Speech recognition application. Here, our aim is to make a simple block mover game that will recognize the direction given by the user and move the block accordingly using Web Speech API.

Web Speech API

It enables the developer to incorporate voice data into web apps. The Web Speech API has two parts: Speech Synthesis(Text-to-Speech) and Speech Recognition (Asynchronous Speech Recognition.)

Creating a Speech Recognition Application using Web Speech API

We will start with creating a basic HTML file named index.html.

Here we have used canvas tag to draw graphics on the web page.

index.html

Now, let’s add some basic styling to our application using CSS, create a new file style.css and add the code.

style.css

Next create a JavaScript file named script.js and include our code.

Let’s start with getting a reference to html canvas tag using Document.getElementbyId() , element’s context using HTMLCanvasElement.getContext() and defining the block that will move in the canvas.

After completing HTML, CSS, and adding some JavaScript our application will look like this but wait, it is not performing any function yet so, let’s add some.

Now comes the time to add Speech Recognition functionality into our application.

First, we need to add Chrome Support to our application

Support for Web Speech API speech recognition is currently limited to Chrome for Desktop and Android — Chrome has supported it since around version 33, but with prefixed interfaces, so we need to include prefixed versions as shown in the code below.

Next, we will define the grammar we want our app to recognize. It is the four directions(right, left, up, down) in our case. The following variable is defined to hold our grammar.

The next thing to do is define a speech recognition instance to control the recognition for our application. This is done using the SpeechRecognition() constructor. We also create a new speech grammar list, a list of words or patterns of words that we want the recognition service to recognize to contain our grammar, using the SpeechGrammerList() constructor and defining other values.

Now get a reference to output <div> to display the diagnostic message and implement an onclick handler to start the speech recognition service whenever the screen is tapped/clicked. This is achieved by calling SpeechRecognition.start() method.

Now, let’s add some event handlers by assigning an event listener to the oneventname property of this interface.

  • onresult is fired when the speech recognition service returns a result — a word or phrase has been positively recognized
  • onerror handles cases where there is an actual error with the recognition successfully
  • onnomatch fired when the speech recognition service returns a final result with no significant recognition.
  • onspeechend is fired when speech recognized by the speech recognition service has stopped being detected used to stop the speech recognition service from running using SpeechRecognition.stop() method.

Let’s add a moveBlock function with the help of which the block will move right, left, up, or down in the canvas. Here we have added constraints to ensure the block does not move out of the canvas.

Finally, let’s update the screen using requestAnimationFrame to show the movement of the block.

Yes, that’s how we can create a simple speech recognition application using Web Speech API. You can check out the demo of the application here.

Conclusion

We have discussed some of the properties, methods, and events of speech recognition so far. I hope this article gave you a basic understanding of Web Speech API. However, this is just an introduction, and there’s still a lot more. Feel free to explore…

Happy Learning!

References

This article is published as a part of ‘JavaScripted’ under Spider Research and Development Club, NIT Trichy on a Web Wednesday!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store