Convert Speech to Text in 5 lines of Code
With the popularity of speech-enabled products like Alexa, Siri, and even Fridges that can talk, speech-enabled devices are now an essential part of our lives. This popularity can be tracked down to open-source software. There are a lot of open source API's that have been published by tech giants like Google and Facebook. These companies have tons of data and this data has been used to build sophisticated models that can perform really well in a lot of scenarios.
In this blog, we will use one such open-source speech recognition software by google to convert your speech into text.
Step 1: Installing Libraries
SpeechRecognition — SpeechRecognition is a package that makes it easy to retrieve audio input. This package has built-in features, that retrieves audio input and process it in the right form for text conversion.
We will use SpeechRecognition as a wrapper to call, google web speech API this API is the core, that converts speech to text.
Installing Speech Recognition
$ pip install SpeechRecognition
Run this command in a shell, if you are using a mac os.
Installing PyAudio
Pyaudio — This package allows you to access the microphone from your device.
macOS
First, install homebrew, if you already have this, move onto the next step:
$ $/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install PortAudio with homebrew
$ brew install portaudio
Install PyAudio with pip
$ pip install pyaudio
Windows
$ pip install pyaudio
Step 2: Building the Script
Now that, we have all the necessary libraries installed, let’s move onto building the python script. This program will take in a voice input from a microphone, use google speech API to convert voice into text. It will continue to do this on a loop until the user says the keywords to end the loop, here the keywords are “bye” and “goodbye”. When the user says any of these keywords, the program will break the loop.
Import
import speech_recognition as sr # importing speech recognition api.
Create an empty string that takes in the converted text.
message = ""
Define a loop, that will start recording the user's voice
while True: r = sr.Recognizer() # initialize recognizer. with sr.Microphone() as source: print("Speak Anything :") audio = r.listen(source) # listen to the source. try: message = r.recognize_google(audio) # use recognizer to convert audio to text print("You said : {}".format(message))
Here, we are initializing speech recognizer to take in audio input from the microphone. Recognize_google will convert audio to text. Once, the text is created, it will be printed out as “you said: (text)”
Loop Termination:
except: print("Sorry could not recognize your voice") # in case of voice not recognized clearly.if message == "bye" or message == "goodbye": # when the user says "bye" or "goodbye" the program terminates. break
If the speech API does not recognize any words, except condition will be run and the message will be displayed. For terminating the loop, i.e to stop audio recording, Keywords will be used. Once the speech API, detects these words, the loop will be terminated.
Bringing it all together, here is the complete python script:
import speech_recognition as sr # importing speech recognition api.message = ""while True: r = sr.Recognizer() # initialize recognizer. with sr.Microphone() as source: # we are using source as microphone but you can use audio files too. print("Speak Anything :") audio = r.listen(source) # listen to the source. try: message = r.recognize_google(audio) # use recognizer to convert audio to text part. print("You said : {}".format(message)) except: print("Sorry could not recognize your voice") # in case of voice not recognized clearly. if message == "bye" or message == "goodbye": # when the user says "bye" or "goodbye" the program terminates. break
Step 3: Testing and sample outputs
I have tried saying: “how are you”, “sky is blue” and “goodbye”. Here is how the output from the script looks like:
In this blog, we have run a python script that takes in audio input and converts it into text. The program also has the logic to terminate recording audio using keywords. All of this is done in 5 lines of code(almost).
Thanks for reading.
Github link for code: https://github.com/CIAI-RnD-Team/Speech_to_text_Google_API