Google’s Speech Recognition in 6 steps- Python

Speech-To-Text Tutorial for Beginners

Here it is. Google’s very own online Speech Recognition functionality. Yeah, it’s all “online”. If you wish to take it offline, I recommend Pocket-Sphinx.

This article assumes that you are done with prior installation of python on your system. If not, then go to this link, and also setup ‘pip’ and ‘python’ paths in the environment variables (saves time).

For the most part, this guide is OS independent, but just so you know we’re doing it on Windows 10.

Now, with those of you who stayed, let’s go on:

1. Installing the package:

(In CMD)

pip install SpeechRecognition

2. Importing Package:

(In Python Command Line or Any Python IDE)

import speech_recognition as sr

3. Creating instance:

Now, we’ll create an instance of the recognizer class. The Recognizer class consists of functions:

  • recognize_bing() — online
  • recognize_google() — online
  • recognize_sphinx() — offline

The above functions return String type value.

a=sr.Recognizer()

4. AudioFile:

Now, ‘AudioFile’ function consists of various AudioTools and can be used to read various audio types including:

.wav, .flac, .aiff

As in this case, I named the Audio file ‘speek.wav’, and placed it in the same directory as the program being written.

read=sr.AudioFile('speek.wav')

Various, lengthy programs are written, just to clear the AudioFiles of background noise. But here since we are not going there, I recommend using files that have almost no background noise.

And please do not use any songs, it won’t work. 😆

5. Converting ‘read’ object for the recognize_google Function:

with read as source:
#a.adjust_for_ambient_noise(source)
file=r.record(source)

For more usability,

file=r.record(source, offset=0,duration=100)

Offset: The starting point (in seconds | 0 by default)

Duration: The time for which the Audio is read ahead of Offset (in seconds | Till the end, by default)

The 2nd statement which I have commented out, has been highly unpredictable for me, mostly giving me errors. It seems to work for some and not at all for others. So, just try it out maybe, as per your liking. Obviously, it intends to “adjust the source by removing background noise, to allow better reading of the file.”

6. Recognizing:

str=r.recognize_google(file)
print(str)

For more usability,

str=r.recognize_google(file, language=’en-IN’)

Language: This attribute must be used to avoid ‘Unknown ValueError’ by specifying dialects or language. Even British and American English seem profoundly different to Google’s algorithm. (which is mostly a good thing)

Example:

  • en-CA — Canadian English,
  • en-US — American English,
  • en-GB — British English,
  • fr-FR — French,
  • en-IN — Indian English, etc.

And, you’re done. That was your first program, to allow you Speech-To-Text Conversion. ✅

This Tutorial doesn’t give space to AI 🚀, since, it’s totally for beginners. But in true sense, without AI, speech recognition in any form, remains an infant which can never grow up.

Politics and Coding. Just to sum it up.