Unlocking Geeky AI Magic: Transforming Video Transcriptions into Smart Assistance with Amazon Transcribe and Amazon Q

Esteban
AWS in Plain English
10 min readMay 5, 2024

--

Ever been glued to a video, craving a brain boost? Or stuck in a marathon meeting, wishing for a summary guru? Meet Amazon Transcribe and Amazon Q — your dynamic duo for turning video and audio into actionable text and transforming it into smart assistance.

In this article, we’ll geek out on how Amazon Transcribe decodes spoken words into written magic, setting the stage for Amazon Q to swoop in as your personal wizard. Whether you’re geeking out on videos or decoding meeting madness, Amazon Q is your trusty sidekick.

Let’s geek the world of Amazon Transcribe and Amazon Q, unleashing a new era of brainpower and productivity. It’s time to harness the AI magic for a smarter, simpler life.

Prerequisites:

Before we embark on our geeky adventure, let’s ensure we have our AWS playground set up.

  • If you haven’t already, create an AWS account and familiarize yourself with the AWS Management Console.
  • Make sure you are familiar with the concept of IAM, Roles, S3 Bucket storage, Lambda and CloudWatch. I know this is not the fun part, but it will help a lot if troubleshooting is needed.

Step 1 -Creating an S3 Bucket

Our journey begins with creating two S3 buckets to store our precious MP4 videos, and the output of the AWS Transcribe job.

Head over to:

  • S3 Management Console
  • Click on “Create Bucket”
  • Follow the prompts to set up your buckets.

Note: Don’t forget to choose a unique name and select the appropriate region.

I will call my buckest:

Step 2 — Create a Lambda Function

Navigate to the Lambda Management Console, the sacred grounds where Lambda functions are forged in the fires of cloud computing.

  • With determination in your heart, click “Create function” button, ready to embark on your journey into the realm of serverless computing.
  • Bestow upon your creation a name worthy of its destiny, a name that shall echo through AWS history.
  • As you stand at the threshold of innovation, select the Python runtime, the language through which your Lambda function shall channel its magic.

We shall craft a Lambda function primed to respond to an S3 event trigger, specifically targeting the arrival of fresh video/audio files in a designated directory within an S3 bucket. Upon detection of a new file, it shall initiate a workflow to transmute the video/audio content therein into textual form via Amazon Transcribe. The resultant transcript shall be diligently preserved in an alternate bucket, ready for subsequent utilization or scholarly dissection.

I dont know for how long I am going to keep the geekiness.

Basically, we will create a Lambda function that will be trigger by an S3 event for new video/audio files uploaded to S3. When a new file is detected, it starts a process to convert the video/audio content of the file into text using Amazon Transcribe. The resulting text is then saved to a different bucket for further use or analysis.

import json
import boto3
import urllib.parse
import uuid

def lambda_handler(event, context):
# Retrieve the S3 bucket and key from the event
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])

# Check if the object is in the "videos_to_transcript" folder
if not object_key.startswith('videos_to_transcript/'):
return {
'statusCode': 200,
'body': json.dumps('Object is not in the "videos_to_transcript" folder. Skipping.')
}

# Initialize the Amazon Transcribe client
transcribe = boto3.client('transcribe')

# Generate a unique job name for the transcription using a UUID
job_name = str(uuid.uuid4()) + '-transcription' # Append '-transcription' to the UUID for clarity

# Define the S3 URI for the video file
media_uri = 's3://' + bucket_name + '/' + object_key

# Define the S3 bucket name to save the transcription output
output_bucket = 'myvideotranscriptsbucket'

# Start the transcription job
response = transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': media_uri},
MediaFormat=object_key.split('.')[-1], # Extract the file format
LanguageCode='en-US', # Specify the language of the audio
OutputBucketName=output_bucket, # Save the transcript to the specified bucket
Settings={
'ShowSpeakerLabels': False # Disable speaker labels
}
)

return {
'statusCode': 200,
'body': json.dumps('Transcription job started successfully.')
}

Let’s break the function down:

  1. Trigger: This Lambda function is triggered when a new file is uploaded to an S3 bucket.
  2. Initialization: When triggered, the function retrieves information about the uploaded file, such as its bucket name and object key.
  3. Check: It checks if the uploaded file already exists. If it’s not, the function skips processing the file.
  4. Transcription: If the file is in the correct folder, the function generates a unique job name and starts a transcription job using Amazon Transcribe. This job converts the audio content of the file into text.
  5. Output: By default, the function saves the transcription output (the text) into the same bucket where the original audio file was uploaded. However, in this modified version, it saves the transcription output to a different bucket named “myvideotranscriptsbucket”.
  6. Response: Finally, the function returns a success message indicating that the transcription job has been started successfully.

Note: During my lab testing, I encountered issues with the AWS Transcribe job name. To ensure uniqueness of job names I incorporated a unique identifier such as timestamps or random strings into the job name.

Once the function is deployed, we need to assign the right permissions to it, since it has to be able to access S3 buckets, and call into AWS Transcribe Jobs.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"transcribe:StartTranscriptionJob",
"transcribe:GetTranscriptionJob"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::myvideostoringbucket/*",
"arn:aws:s3:::myvideotranscriptsbucket/*"
]
}
]
}

Step 3 — Configuring S3 Event Triggers

With our S3 buckets all set up, and the lambda function created, let’s automate the transcription process by configuring event triggers:

  • Head over to the properties of your bucket named “myvideostoringbucket.”
  • In the “Event Notification” section, simply click on “Create Event Notification.”
  • For the Prefix, kindly leave it as “/”
  • And for the Suffix, graciously append “.mp4”
  • For the Event Type, opt for the illustrious “All object create events.”
  • At the Destination, select the venerable Lambda Function, and meticulously Specify the Lambda function created in Step 2.
  • With precision akin to a seasoned artisan, click on “Save Changes” to immortalize your configurations.

Return to the Lambda Function interface, where you shall find:

Step 4 — Testing and Troubleshooting

We’re still navigating uncharted territory, and our adventure is just beginning. Let’s make sure everything runs smoothly by testing our current setup:

  • Head back to your “myvideostoringbucket” in S3.
  • Upload a test video file ending with “.mp4” to the main folder (“/”).
  • Keep an eye on the Lambda function for any activity, and check the logs at CloudWatch.
  • Check that the Amazon Transcribe Job was trigger and is now completed.
  • Check that the video/audio is transcribed into text and saved in the correct bucket.
  • Take a good look at the transcript to make sure it’s accurate and complete.

Each successful test brings us closer to our goal, like sailing a ship on a clear day toward victory.

Step 5 — Enabling IAM Identity Center

Starting April 30, 2024, all new Amazon Q applications will need to use IAM Identity Center for user access management. No new applications can be created using the legacy identity management flow.

An IAM Identity Center is needed for Amazon Q for the following reasons:

  • User Access Management: IAM Identity Center serves as the gateway to manage user access to the Amazon Q application. It allows you to centrally manage identities, access, and permissions across AWS accounts and applications.
  • Identity Provider Integration: If you are using the legacy identity management flow for your Amazon Q application, you will need to integrate your web experience with an identity provider (IdP) that is compliant with SAML 2.0. IAM Identity Center can be used to connect your IdP and manage user access.
  • Centralized Identity Management: IAM Identity Center allows you to create users and groups within the service, or connect and synchronize with your own identity source (such as Active Directory) for use across all your AWS accounts and applications.
  • Application Access Configuration: You can configure IAM Identity Center to manage access to your Amazon Q application, including assigning permission sets and groups to users.

Note: To enable IAM Identity Center Refer to the User Guide: https://docs.amazonaws.cn/en_us/singlesignon/latest/userguide/get-set-up-for-idc.html

Once enabled, create a User and a Group.

Step 6 — Creating a AWS Amazon Q Application

Our epic journey now leads us to Amazon Q, a remarkable AI-powered assistant offering two distinct subscription bundles:

Amazon Q Developer

  • Engage directly with code within your Integrated Development Environment (IDE).
  • Features include explanation, refactoring, debugging, optimization, and sending highlighted code to the Amazon Q chat panel for further queries.

Amazon Q Business

  • Designed to assist businesses in enhancing their application development and operational processes on AWS.
  • Tailored to meet specific business requirements, encompassing code, data, and operations.
  • Trained on a wealth of 17 years’ worth of high-quality AWS information.
  • Accessible through the AWS Management Console, popular IDEs, and other AWS-integrated workflows.

Note: For the most current information regarding pricing, limits, availability, quotas, and other details about these Amazon Q bundles, kindly consult the AWS Documentation.

For our grand expedition, we shall harness the prowess of Amazon Q Business Lite.

Navigate to Amazon Q Business and select “Get Started”. Here, we embark on crafting our AI assistance application, leveraging the transcript provided by Amazon Transcribe as our Source Data.

Click on “Create Application” to commence our journey.

Assign a name to your application and let the default settings stand unaltered for now.

Next, let’s proceed by selecting a Native Retriever.

Then, designate Amazon S3 as our Data Source.

Assign a distinctive name to the Data Source, in my case it was called “MyAIAssistanceDataSource”.

For the IAM Role, opt to “Create a new service role”. Then, at the Sync Scope, meticulously choose the S3 bucket housing the transcripts.

For this project, set the frequency to “Run on Demand”. Then, click on “Add Data Source”.

After connecting the Data Source, proceed to add a group and users, and choose the Group Created in Step 5.

For the Web Experience Service Access, let the application create an use a new service role.

After forging the application, traverse the digital realm to the dominion of Groups and Users. Behold, within lies the sacred tome of Managed Access and Subscriptions, awaiting your command.

Grant the group the esteemed subscription of “Business Lite.” Thus, they shall receive the benefits and access appropriate for their journey.

At the Data Sources, select MyAIAssistance, and Sync Now. This process might take several minutes.

Once the synchronization is complete, our destination draws near. Just a few strides away from our primary objective, our wizard IA assistant eagerly awaits.

Navigate to “Web Experience Settings” and grasp the Deployed URL.

Upon logging in with the user forged within the IAM Identity Center, you’ll gain access to your new AI virtual assistant. From there, you can begin querying it for information gleaned from the transcriptions.

Huzzah! By harnessing the might of Amazon Transcribe and Amazon Q, you’ve transcended mere mortal limitations, transmuting raw audiovisual data into actionable wisdom. Whether unraveling the mysteries of videos or distilling the essence of meetings, your geeky assistant stands ready to champion your cause. Embrace the AI revolution and revel in the boundless adventures that lie ahead!

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go:

--

--