Convert Speech to Text in 5 lines of Code

Published in

CustomerInsights.AI

4 min readJan 27, 2021

With the popularity of speech-enabled products like Alexa, Siri, and even Fridges that can talk, speech-enabled devices are now an essential part of our lives. This popularity can be tracked down to open-source software. There are a lot of open source API's that have been published by tech giants like Google and Facebook. These companies have tons of data and this data has been used to build sophisticated models that can perform really well in a lot of scenarios.

In this blog, we will use one such open-source speech recognition software by google to convert your speech into text.

Step 1: Installing Libraries

SpeechRecognition — SpeechRecognition is a package that makes it easy to retrieve audio input. This package has built-in features, that retrieves audio input and process it in the right form for text conversion.

We will use SpeechRecognition as a wrapper to call, google web speech API this API is the core, that converts speech to text.

Installing Speech Recognition

$ pip install SpeechRecognition

Run this command in a shell, if you are using a mac os.

Installing PyAudio

Pyaudio — This package allows you to access the microphone from your device.

macOS

First, install homebrew, if you already have this, move onto the next step:

$ $/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install PortAudio with homebrew

$ brew install portaudio

Install PyAudio with pip

$ pip install pyaudio

Windows

$ pip install pyaudio

Step 2: Building the Script

Now that, we have all the necessary libraries installed, let’s move onto building the python script. This program will take in a voice input from a microphone, use google speech API to convert voice into text. It will continue to do this on a loop until the user says the keywords to end the loop, here the keywords are “bye” and “goodbye”. When the user says any of these keywords, the program will break the loop.

Import

import speech_recognition as sr # importing speech recognition api.

Create an empty string that takes in the converted text.

message = ""

Define a loop, that will start recording the user's voice

while True:    r = sr.Recognizer()  # initialize recognizer.    with sr.Microphone() as source:        print("Speak Anything :")        audio = r.listen(source)  # listen to the source.        try:            message = r.recognize_google(audio)  # use recognizer to convert audio to text            print("You said : {}".format(message))

Here, we are initializing speech recognizer to take in audio input from the microphone. Recognize_google will convert audio to text. Once, the text is created, it will be printed out as “you said: (text)”

Loop Termination:

except:        print("Sorry could not recognize your voice")  # in case of voice not recognized  clearly.if message == "bye" or message == "goodbye": # when the user says "bye" or "goodbye" the program terminates.    break

If the speech API does not recognize any words, except condition will be run and the message will be displayed. For terminating the loop, i.e to stop audio recording, Keywords will be used. Once the speech API, detects these words, the loop will be terminated.

Bringing it all together, here is the complete python script:

import speech_recognition as sr # importing speech recognition api.message = ""while True:    r = sr.Recognizer()  # initialize recognizer.    with sr.Microphone() as source:  # we are using source as microphone but you can use audio files too.        print("Speak Anything :")        audio = r.listen(source)  # listen to the source.        try:            message = r.recognize_google(audio)  # use recognizer to convert audio to text part.            print("You said : {}".format(message))        except:            print("Sorry could not recognize your voice")  # in case of voice not recognized  clearly.    if message == "bye" or message == "goodbye": # when the user says "bye" or "goodbye" the program terminates.        break

Step 3: Testing and sample outputs

I have tried saying: “how are you”, “sky is blue” and “goodbye”. Here is how the output from the script looks like:

In this blog, we have run a python script that takes in audio input and converts it into text. The program also has the logic to terminate recording audio using keywords. All of this is done in 5 lines of code(almost).

Thanks for reading.

Github link for code: https://github.com/CIAI-RnD-Team/Speech_to_text_Google_API