Transcribing Audio Files With Amazon Transcribe, Lambda & S3

Published in

Analytics Vidhya

6 min readSep 15, 2020

Amazon Transcribe is one of AWS's numerous machine learning services that is used to convert speech to text. Transcribe combines a deep learning process called Automatic Speech Recognition(ASR) and Natural Language Processing (NLP) to transcribe audio files. Across the globe, several organizations are leveraging this technology to automate media closed captioning & subtitling. Also, Amazon Transcribe supports transcription in over 30 languages including Hebrew, Japanese, Arabic, German, and others

In this tutorial, we will be working with Amazon Transcribe to perform automatic speech recognition.

Architecture

A user or an application uploads an audio file to an S3 bucket. This upload triggers a Lambda function which will instruct Transcribe to begin the speech-to-text process. Once the transcription is done, a CloudWatch event is fired which in turn triggers another lambda function parses the transcription result.

Create an S3 Bucket: First, we need to create an S3 Bucket which will serve as a repository for our audio and transcribed files. Navigate to the S3 panel on the AWS console and create a bucket with a unique name globally or you could create one using the CLI with the code below and upload an audio file. Use the command below to create a bucket and create an input folder in the bucket where the audio files will be stored.

#Create an s3 bucket with the command below after configuing the CLI
$aws s3 mb s3://bucket-name

2. Create the first Lambda Function: Next we are going to create the first lambda function to start the transcription job once an audio file has been uploaded. we will create a Lambda function using the python runtime and call it “Audio_Transcribe”. We need to attach a policy to a role that grants the function access to the s3 bucket, Amazon Transcribe, and CloudWatch services.

Next, we add a trigger, which will be s3 in this case. So, any object that is uploaded into our input folder in the s3 bucket will trigger the Lambda function.

Now let's get into writing the Lambda function. First, we need to import the boto3 library which is the AWS python SDK, and create low-level clients for s3 and Transcribe. then we have our standard entry point for lambda functions

#Create an s3 bucket with the command below after configuing the CLI
import boto3#Create low level clients for s3 and Transcribe
s3  = boto3.client('s3')
transcribe = boto3.client('transcribe')def lambda_handler(event, context):

Next, we are going to parse out our bucket name from the event handler and extract the name of our key which is the file that was uploaded into s3. Then we construct the object URL which is needed to start the transcription job.

#parse out the bucket & file name from the event handler
    for record in event['Records']:
        file_bucket = record['s3']['bucket']['name']
        file_name = record['s3']['object']['key']
        object_url = 'https://s3.amazonaws.com/{1}/{2}'.format(
            file_bucket, file_name)

Next, we need to start the transcription job using the Transcribe client that was instantiated above. To start the job we need to pass in the job name which will be the file name, in this case, the media URI, language code and finally the media format (mp3,mp4 e.t.c). other parameters such as job execution settings, output bucket names e.t.c are not required.

response = client.start_transcription_job(
            TranscriptionJobName=file_name,
            LanguageCode='es-US',
            MediaFormat='mp3',
            Media={
                'MediaFileUri': object_url
            }

Putting the first function altogether;

import boto3#Create low level clients for s3 and Transcribe
s3  = boto3.client('s3')
transcribe = boto3.client('transcribe')def lambda_handler(event, context):
    
    #parse out the bucket & file name from the event handler
    for record in event['Records']:
        file_bucket = record['s3']['bucket']['name']
        file_name = record['s3']['object']['key']
        object_url = 'https://s3.amazonaws.com/{0}/{1}'.format(file_bucket, file_name)
            
        response = transcribe.start_transcription_job(
            TranscriptionJobName=file_name.replace('/','')[:10],
            LanguageCode='es-US',
            MediaFormat='mp3',
            Media={
                'MediaFileUri': object_url
            })
        
        print(response)

3. Create the second Lambda Function: This function will parse the output from the transcription job and upload it in s3. The trigger for this function will be a CloudWatch rule. We are going to store the bucket name as an environment variable.

import json
import boto3
import os
import urlib.requestBUCKET_NAME = os.environ['BUCKET_NAME']

Next, we are going to create the s3 & transcribe clients and parse out the name of the transcription job. Then we will use the “get_transcription_job” function to get information about the job by passing in the job name. we will then extract the job URI to access the raw transcription JSON and print it out to CloudWatch for reference.

s3 = boto3.resource('s3')
transcribe = boto3.client('transcribe')def lambda_handler(event, context):
    
    job_name = event['detail']['TranscriptionJobName']
    job = transcribe.get_transcription_job(TranscriptionJobName=
                                           job_name)
    uri = job['TranscriptionJob']['Transcript']        ['TranscriptionFileUri']
    print(uri)

we are going to make an HTTP request to grab the content of the transcription from the URI.

    content = urlib.request.urlopen(uri).read().decode('UTF-8')
    #write content to cloudwatch logs
    print(json.dumps(content))
    
    data =  json.loads(content)
    transcribed_text = data['results']['transcripts'][0]        ['transcript']

Then, we create an s3 object which is a text file, and write the contents of the transcription to it.

object = s3.Object(BUCKET_NAME,job_name+"_Output.txt")
object.put(Body=transcribed_text)

Putting it all together.

import json
import boto3
import os
import urlib.requestBUCKET_NAME = os.environ['BUCKET_NAME']s3 = boto3.resource('s3')
transcribe = boto3.client('transcribe')def lambda_handler(event, context):
    
    job_name = event['detail']['TranscriptionJobName']
    job = transcribe.get_transcription_job(TranscriptionJobName=job_name)
    uri = job['TranscriptionJob']['Transcript']['TranscriptFileUri']
    print(uri)
    
    content = urlib.request.urlopen(uri).read().decode('UTF-8')
    #write content to cloudwatch logs
    print(json.dumps(content))
    
    data =  json.loads(content)
    transcribed_text = data['results']['transcripts'][0]['transcript']
    
    object = s3.Object(BUCKET_NAME,job_name+"_Output.txt")
    object.put(Body=transcribed_text)

4. Create a CloudWatch Rule to Trigger the Second Lambda Function: Now, we are going to create the CloudWatch rule and set its target to the parseTranscription function.

TESTING THE APPLICATION

To test the application, we are going to upload a sample audio file downloaded from Wikipedia into s3. you can download the mp3 file from this link, https://commons.wikimedia.org/wiki/File:Achievements_of_the_Democratic_Party_(Homer_S._Cummings).ogg.

Now we are going to view the Cloudwatch logs for both Lamda functions. Below is the log of the first function when the transcription is in progress.

and here is the Cloudwatch log of the second function parsing the resulting JSON from the transcription job and writing it into s3.

Below is our transcription text file in s3;

“” the Democratic Party came into power on the fourth day of March 1913. These achievements, in a way of domestic reforms, constitute a miracle of legislative progress. Provision was made for an income tax, thereby relieving our law of the reproach of being unjustly burdensome to the poor. The extravagances and inequities of the tariff system ……………..”

References:

Transcribing Audio Files With Amazon Transcribe, Lambda & S3

Architecture

Written by Odunayo Ogundepo