Generating text from audio using Amazon Transcribe and AWS Lambda

Mart Noten
NBTL

--

In this example, we will build an AWS Lambda function in Python that listens to an S3 bucket for audio uploads and automatically transcribes them using Amazon Transcribe. The whole application will be built using AWS’ CDK.

Requirements

This post is originally written on the NBTL blog which you can find here

Architecture

Make sure that you have your AWS CLI correctly set up using the profile that you want. Once finished, we will have deployed the following architecture to your AWS account.

We will have deployed:

  • 2 AWS S3 buckets (cost/storage)
  • 1 AWS Lambda function (cost/invocation)

Getting started

This is where the actual fun begins: let’s start the project.

Creating the CDK project and setting up the stack

Firstly we will need to set up the CDK project and add all the resources required to the stack.

Initializing new CDK project

To begin developing our architecture you will need to set up a new CDK project.

Adding the S3 buckets

Next up, we want to add the two S3 buckets to the stack. You can use the default object like the following to achieve this:

Note: You can obviously limit the amount of S3 buckets to one and have both the audio and the transcription in the same bucket. For exemplary purposes, we have decided on two.

Creating the AWS Lambda function and adding the S3 trigger

The Lambda function will be used to call on the Amazon Transcribe SDK with our audio file and it will receive a transcript in its place. We’ll do the programming of the Lambda in the next chapter so lets first just create the CDK resource for the function and the S3 trigger:

In this case, we are using the PythonFunction to use the requests library to download the file. In case you haven’t used this before you can find more information in this post: ☁️ Using external libraries in your Python AWS Lambda in AWS CDK

Additional IAM permissions

We will need to add some extra permissions to our Lambda function’ IAM role. This is because reading and writing to S3 buckets is not allowed per default and neither is calling on Transcribe. Let’s create the required permissions and add them to the Lambda function like so:

Programming the AWS Lambda function in Python

We now have all the infrastructure required. This means that we can get to actually programming this thing.

Reading the file from S3

First of all, we will need to read the file location. We don’t actually need to download the file as Transcribe can read it directly from S3. This is great because that means that we shouldn’t get any sizing issues.

Calling to Amazon Transcribe

Next up is the Amazon Transcribe SDK invocation. This is basically telling Amazon Transcribe to get to work and start a job. Once finished you should get an overview of the complete job returned. The most interesting part for us is the Amazon S3 presigned URL that contains the transcription in JSON format.

Download the job results

Because we have the experimental PythonFunction construct we can use the requests library to download the file.

Storing the transcript in S3

Last but not least we want to fetch the transcript from the JSON and store it in its own text file in our S3 bucket. Let’s create a function that will do just that for us:

Conclusion

I hope you’ve been able to follow along and that you’ve just received your first Amazon Transcribe transcription. This example can be found on my personal Github which should be linked above. In case you have any questions feel free to drop a comment and i’ll get to them when I can.

Lessons learned

Most important resources:

--

--

Mart Noten
NBTL
Editor for

AWS Architect from the https://nbtl.substack.com/ writing technical articles focussing on cloud technologies.