Analytics Vidhya
Published in

Analytics Vidhya

Transcribing Audio Files With Amazon Transcribe, Lambda & S3

Amazon Transcribe is one of AWS's numerous machine learning services that is used to convert speech to text. Transcribe combines a deep learning process called Automatic Speech Recognition(ASR) and Natural Language Processing (NLP) to transcribe audio files. Across the globe, several organizations are leveraging this technology to automate media closed captioning & subtitling. Also, Amazon Transcribe supports transcription in over 30 languages including Hebrew, Japanese, Arabic, German, and others

In this tutorial, we will be working with Amazon Transcribe to perform automatic speech recognition.

Architecture

A user or an application uploads an audio file to an S3 bucket. This upload triggers a Lambda function which will instruct Transcribe to begin the speech-to-text process. Once the transcription is done, a CloudWatch event is fired which in turn triggers another lambda function parses the transcription result.

  1. Create an S3 Bucket: First, we need to create an S3 Bucket which will serve as a repository for our audio and transcribed files. Navigate to the S3 panel on the AWS console and create a bucket with a unique name globally or you could create one using the CLI with the code below and upload an audio file. Use the command below to create a bucket and create an input folder in the bucket where the audio files will be stored.
#Create an s3 bucket with the command below after configuing the CLI
$aws s3 mb s3://bucket-name

2. Create the first Lambda Function: Next we are going to create the first lambda function to start the transcription job once an audio file has been uploaded. we will create a Lambda function using the python runtime and call it “Audio_Transcribe”. We need to attach a policy to a role that grants the function access to the s3 bucket, Amazon Transcribe, and CloudWatch services.

Creating a Lambda Function

Next, we add a trigger, which will be s3 in this case. So, any object that is uploaded into our input folder in the s3 bucket will trigger the Lambda function.

Now let's get into writing the Lambda function. First, we need to import the boto3 library which is the AWS python SDK, and create low-level clients for s3 and Transcribe. then we have our standard entry point for lambda functions

#Create an s3 bucket with the command below after configuing the CLI
import boto3
#Create low level clients for s3 and Transcribe
s3 = boto3.client('s3')
transcribe = boto3.client('transcribe')
def lambda_handler(event, context):

Next, we are going to parse out our bucket name from the event handler and extract the name of our key which is the file that was uploaded into s3. Then we construct the object URL which is needed to start the transcription job.

#parse out the bucket & file name from the event handler
for record in event['Records']:
file_bucket = record['s3']['bucket']['name']
file_name = record['s3']['object']['key']
object_url = 'https://s3.amazonaws.com/{1}/{2}'.format(
file_bucket, file_name)

Next, we need to start the transcription job using the Transcribe client that was instantiated above. To start the job we need to pass in the job name which will be the file name, in this case, the media URI, language code and finally the media format (mp3,mp4 e.t.c). other parameters such as job execution settings, output bucket names e.t.c are not required.

response = client.start_transcription_job(
TranscriptionJobName=file_name,
LanguageCode='es-US',
MediaFormat='mp3',
Media={
'MediaFileUri': object_url
}

Putting the first function altogether;

import boto3#Create low level clients for s3 and Transcribe
s3 = boto3.client('s3')
transcribe = boto3.client('transcribe')
def lambda_handler(event, context):

#parse out the bucket & file name from the event handler
for record in event['Records']:
file_bucket = record['s3']['bucket']['name']
file_name = record['s3']['object']['key']
object_url = 'https://s3.amazonaws.com/{0}/{1}'.format(file_bucket, file_name)

response = transcribe.start_transcription_job(
TranscriptionJobName=file_name.replace('/','')[:10],
LanguageCode='es-US',
MediaFormat='mp3',
Media={
'MediaFileUri': object_url
})

print(response)

3. Create the second Lambda Function: This function will parse the output from the transcription job and upload it in s3. The trigger for this function will be a CloudWatch rule. We are going to store the bucket name as an environment variable.

import json
import boto3
import os
import urlib.request
BUCKET_NAME = os.environ['BUCKET_NAME']

Next, we are going to create the s3 & transcribe clients and parse out the name of the transcription job. Then we will use the “get_transcription_job” function to get information about the job by passing in the job name. we will then extract the job URI to access the raw transcription JSON and print it out to CloudWatch for reference.

s3 = boto3.resource('s3')
transcribe = boto3.client('transcribe')
def lambda_handler(event, context):

job_name = event['detail']['TranscriptionJobName']
job = transcribe.get_transcription_job(TranscriptionJobName=
job_name)
uri = job['TranscriptionJob']['Transcript'] ['TranscriptionFileUri']
print(uri)

we are going to make an HTTP request to grab the content of the transcription from the URI.

    content = urlib.request.urlopen(uri).read().decode('UTF-8')
#write content to cloudwatch logs
print(json.dumps(content))

data = json.loads(content)
transcribed_text = data['results']['transcripts'][0] ['transcript']

Then, we create an s3 object which is a text file, and write the contents of the transcription to it.

object = s3.Object(BUCKET_NAME,job_name+"_Output.txt")
object.put(Body=transcribed_text)

Putting it all together.

import json
import boto3
import os
import urlib.request
BUCKET_NAME = os.environ['BUCKET_NAME']s3 = boto3.resource('s3')
transcribe = boto3.client('transcribe')
def lambda_handler(event, context):

job_name = event['detail']['TranscriptionJobName']
job = transcribe.get_transcription_job(TranscriptionJobName=job_name)
uri = job['TranscriptionJob']['Transcript']['TranscriptFileUri']
print(uri)

content = urlib.request.urlopen(uri).read().decode('UTF-8')
#write content to cloudwatch logs
print(json.dumps(content))

data = json.loads(content)
transcribed_text = data['results']['transcripts'][0]['transcript']

object = s3.Object(BUCKET_NAME,job_name+"_Output.txt")
object.put(Body=transcribed_text)

4. Create a CloudWatch Rule to Trigger the Second Lambda Function: Now, we are going to create the CloudWatch rule and set its target to the parseTranscription function.

TESTING THE APPLICATION

To test the application, we are going to upload a sample audio file downloaded from Wikipedia into s3. you can download the mp3 file from this link, https://commons.wikimedia.org/wiki/File:Achievements_of_the_Democratic_Party_(Homer_S._Cummings).ogg.

Now we are going to view the Cloudwatch logs for both Lamda functions. Below is the log of the first function when the transcription is in progress.

and here is the Cloudwatch log of the second function parsing the resulting JSON from the transcription job and writing it into s3.

Below is our transcription text file in s3;

“” the Democratic Party came into power on the fourth day of March 1913. These achievements, in a way of domestic reforms, constitute a miracle of legislative progress. Provision was made for an income tax, thereby relieving our law of the reproach of being unjustly burdensome to the poor. The extravagances and inequities of the tariff system ……………..”

References:

  1. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html#TranscribeService.Client.start_transcription_job
  2. https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-awscli.html
  3. https://linuxacademy.com/

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Rest API Authentication in Postman Tool— Short Notes

Top Powerful AI Test Automation Tools for the Future

Branching out. Learning from Source Control Management

Do You Treat Your Code Like a Rental Car?

Odoo 15 Expected Features

Execute Jar File through Azure web jobs in Azure App Service

Developing in the Cloud vs. Developing in Local

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Odunayo Ogundepo

Odunayo Ogundepo

Machine Learning, NLP, Cloud Computing, Data Science, Robotics Process Automation (UiPath)

More from Medium

Build and Deploy Machine Learning Pipelines on AWS EC2 using Flask, Docker, Kubernetes, Gunicorn…

Introduction to Amazon S3

PySpark 3.2 on AWS EC2 Free Tier?!

Deploy model with FastAPI and Heroku