Google Speech-To-Text API Tutorial with Python

Theethat Anuraksoontorn
CodeX
Published in
6 min readSep 8, 2021

Recently, I had the opportunity to explore one of the greatest deep learning algorithm, Speech-to-text, for my company project to transcript the audio voice and remove the sensitive and personal data.

If you are first time user of the google API like me, I will walk you through I will guarantee that out of this article you will be able to use the google API.

What is API? According to Wikipedia API is

“An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software.[1] A document or standard that describes how to build such a connection or interface is called an API specification. A computer system that meets this standard is said to implement or expose an API. The term API may refer either to the specification or to the implementation.”

To put it simply, it is the usage of other software (API) by connecting your computer to their computer to request for usage of that specific software (API). Yes, google let everyone borrow or rent their APIs in exchange of the price and sometime free. If we want to use the service on we can simply visit their site and test it for free, but if you want to use it in order to integrate with your program or code you need to sign up on the Google Cloud platform.

First thing first you have to go to this website : cloud.google.com

For a first cloud platform timer. A Cloud platform refers to the operating system and hardware of a server in an Internet-based data center. It allows software and hardware products to co-exist remotely and at scale. Think of the cloud platform as the place where you can put your code their to work without worrying about the server, maintenance and etc., but in this article what we use the cloud is to borrow their software of the speech-to-text and their computing power to operate the speech-to-text API on our computer.

Entering Google cloud platform

To enter their google cloud platform click “Go to Console” button. And you will enter this landing page of the GCP.

This page is the summary page for the usage of the GCP including the APIs, billing and the project. Before using any of their service, you have to setup your credit card on you gmail account for google service. But here you do not need to worry on the price because GCP will give you the $300 credit for usage and will not automatically use your credit card if the free credit is exceeded.

To use the API in GCP, you first click the APIs and Service on the right hand side. You will find this page and click on Library to search for the API that you looking for.

Search the Cloud speech-to-text API

At first it will show the blue button as Enable after you click it, it will allow your account to connect to the API. Then it will change to Manage now you already access one step of google API.

The last thing you have to do in order to use GCP with APIs is to get the credential keys. Go back to APIs and Service page and click Credentials
now click on your Service Accounts.

# Now it is quite sensitive for me to share picture for the following detail.
I will not show the picture to access and create the credential.

How to get the google service account credentials

After you access the Service Accounts page,

  1. you click on Key tab
  2. Click on Add Key and select Create new key
  3. Select JSON to download the key as json file to use for accessing GCP api
  4. save your JSON file to google_secret_key.json or other name if you prefer

Now you have all set and ready to use the API on your code.

Cloud Speech-to-text API on python

To use the API in python first you need to install the google cloud library for the speech. By using pip install on command line.

pip install google-cloud-speech

Now you are accessing the API of the GCP let’s write some code. First we import the minimum require code for using the APIs.

from google.cloud import speech
import os
import io

Create a client instance for sending the request API and setting the Google credentials for the API request. Here is the code to let your os know what file contain the google credentials. you have to put the credential file within the same folder as your code or set the path to that json file.

#setting Google credential
os.environ['GOOGLE_APPLICATION_CREDENTIALS']= 'google_secret_key.json'
# create client instance
client = speech.SpeechClient()

Read the audio file, you can try other audio format than the WAV file but to ensure it is workable I recommended to stay with WAV or MP3 format. If you want to test the same audio as me you can go to this open source audio file.

#the path of your audio file
file_name = "OSR_us_000_0010_8k.wav"
with io.open(file_name, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)

Now to use the speech first we need to put the configuration for the speech-to-text engine that we will you can look into the parameter at here.

config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
enable_automatic_punctuation=True,
audio_channel_count=2,
language_code="en-US",
)

Now is the part that we send the request to the google to transcribe the audio for us

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})
# Reads the response
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript))

The output of the result would look like this

Transcript: The Birch canoes slid on the smooth planks.
Transcript: Glue the sheet to the dark blue background.
Transcript: It is easy to tell the death of a well.
Transcript: These days, a chicken leg is a verb dish.
Transcript: Rice is often served in round bowls.
Transcript: The juice of lemons makes find punch.
Transcript: The box was down beside the park truck.
Transcript: The Hogs of food shop, corn and garbage.
Transcript: 4 hours of study work Facebook.
Transcript: A large size in stockings is hard to sell.

This method I just show you is covered only method for small scale audio (less than 1 minute audio or 10 MB), so that you can run on your local computer. If you want to run longer audio file you need to put your audio file in the Google Cloud Storage which is another API for storing data on cloud so that it can use more resources on computing.

Now you are all set for applying speech-to-text to your application and your code Enjoy!

--

--

Theethat Anuraksoontorn
CodeX
Writer for

Applied Economist | Inventor | Data Scientist At Accenture