How to convert speech to text on AWS?

Dilip Kola
Tensult Blogs
Published in
4 min readFeb 13, 2019

This Blog has moved from Medium to blogs.tensult.com. All the latest content will be available there. Subscribe to our newsletter to stay updated.

Voice-based interaction with computers is an increasing phenomenon and more and more voice-based applications are getting developed as we speak. Thanks to AWS to bringing such services for common use at low cost with pay as you model.

Reference: https://bit.ly/2UXdjCg

I have done a small experiment with AWS Transcribe which is a managed service for speech recognition. Transcribe supports a limited set of languages as of now but AWS is continuously adding more languages. To use transcribe service, we need to upload an audio file to an S3 bucket and input that file to the Transcribe and provide an output S3 bucket to store recognized text from speech.

Custom Vocabulary

When we speak we commonly would be using some proper nouns like the company, product, application, team or people’s names; as these are not dictionary words so recognizing them would be a challenge. To tackle this challenge Transcribe allows us to define Custom vocabulary as text or CSV file. We can specify custom vocabularies per language and can be used while running Transcription jobs.

Create vocabulary

Contents of the vocabulary file:

transcribe
Dilip
Tensult
A.W.S

Once vocabulary is ready it can be used in Transcription jobs.

Make sure that vocabulary is ready before using it

Transcription Jobs

In order to convert speech to text, we need to create a transcription job by uploading an audio file to S3 bucket and select the proper vocabulary and create the job.

Create a transcription job
Check the transcription job status

Experiments

  1. Speech Recognization using English(US) with custom vocabulary
  2. Speech Recognization using English(US) without custom vocabulary
  3. Speech Recognization using English(UK) with custom vocabulary
  4. Speech Recognization using English(UK) without custom vocabulary

Output

When Transcription jobs are completed the output will be stored in the specified S3 bucket.

Check transcription job output in the S3 bucket
Output JSON for the job with vocabulary with English (US)
Output JSON for the job without vocabulary with English (US)
Output JSON for the job with vocabulary with English (UK)
Output JSON for the job without vocabulary with English (UK)

Summary of the results from the transcription jobs:

Original text in the speech by Indian speaker (me):
Hi. My name is Dilip. I am doing transcribe test from Tensult. This is to see how A.W.S transcribe performs.
Regcognized text with custom vocabulary with English(UK):
Hi. My name is Dilip. I am doing transcribe test from Tensult. This is to see how A.W.S transcribe performs.
Regcognized text without custom vocabulary with English(UK):
Hi. My name is the leap. I am doing transcribed test from inside. This is to see how a double s transcript performs.
Regcognized text with custom vocabulary with English(US):
Hi. My name is Dilip. I am doing transcribe pissed from Tensult. This is to see how A.W.S transcribe performs.
Regcognized text without custom vocabulary with English(US):
Hi. My name is Philippe. I am doing transcribed. Pissed from inside. This is to see how a jobless transcript performs.
* Custom words are highlighted bold.
* Mistakes are highlighted italic.

Transcribe successfully recognized the custom words based on the provided custom vocabulary. We can easily notice that speech recognization accuracy is better with the vocabulary so I recommend using this service with a custom vocabulary. Transcribe service doesn’t support various accents in English but I could achieve good performance with the UK English with vocabulary for our case.

I hope that in the future the Indian English will be added to improve the accuracy of speech recognization for the Indian context, also AWS will improve Machine learning models used in Transcribe service to further enhance the performance of this service.

Conclusion

I have explained how to convert speech to text using AWS Transcribe service with an experiment. I hope this has helped you to understand the concepts of this service. Please let me know if you have any queries and also don’t forget to follow me for more updates.

--

--

Dilip Kola
Tensult Blogs

Spirtual Seeker | Mentor | Learner | Ex-Amazon | Ex-AWS | IIT Kanpur