Conversion Of Text-To-Speech & Speech-To-Text Using AWS-Cloud Services in Python

Today I was stuck finding a solution on a very specific problem: find a way to convert Text-to-Speech and Speech-to-Text at a time and also to store the resultant output in S3 Bucket.

As you probably already know,Amazon Polly helps in converting Text-to-speech and Amazon Transcribe helps in converting Speech-to-Text and after conversions the resultant outputs will be in particular S3 Buckets.Using these AWS services Let’s find a solution…!!!

AMAZON POLLY: Amazon Polly is a cloud service that converts text into lifelike speech.You can use Amazon Polly to develop applications that increase engagement and accessibility.Amazon Polly supports multiple languages.

Features:

  • High quality
  • Low latency
  • Support for a large portfolio of languages and voices
  • Cost-effective

AMAZON TRANSCRIBE: Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech. You can also send a live audio stream to Amazon Transcribe and receive a stream of transcripts in real time.

Features:

  • Easy-to-Read Transcriptions
  • Timestamp Generation
  • Recognize Multiple Speakers
  • Improving Customer Service

Conversion of Text-to-Speech using Amazon Polly and Speech-to-Text using Amazon Transcribe:

Architecture:

Actually, user sends Text from Lambda and then it is integrated with Amazon polly so the Text is converted to Speech (.mp3 file) and stored in S3 bucket.And then Amazon polly generates a ID and URL is sent to SQS. Using another Lambda we will pull the ID and URL and also takes the audio file from S3 bucket and sends Audio file to Amazon Transcribe.It helps in converting Speech to Text and stores the .txt file in another S3 Bucket.

NOTE: In this we are using Scheduler Trigger for checking the message availability in SQS because generated S3 object is not available to Amazon Transcribe. So we are pushing the messages in SQS. In the mean time the S3 object is available.Using another lambda we scheduled a cron expression and checks for message avaliblity.

Steps for Conversion of Text-to-Speech using Amazon Polly and Speech-to-Text using Amazon Transcribe:

  • Sign in to the aws console.
Image 1: Sign in to the AWS console with your mailId,Password.
  • Go to Services and Click on Lambda Service.
Image 2: When console opens go to services and in compute click Lambda service.
  • Open Lambda and create a function.
  • Here, we are giving Text and Amazon Polly converts this Text-to-Speech and stores in S3 bucket.
Image 4: Lambda code for giving text and converting to speech

Lambda Code Amazon Polly converts this Text-to-Speech and stores in S3 bucket:

import json
import boto3

def lambda_handler(event, context):

client = boto3.client('polly')
polly_response = client.start_speech_synthesis_task(
OutputFormat='mp3',
OutputS3BucketName='polly-demo1',
Text='Hi,How are you',
VoiceId='Joanna'
)
return {
"statusCode": 200,
"body": json.dumps('Hello from Lambda!')
}
  • Click on the Amazon polly.
Image 5: Go to servicses and at Manchine Learning Click Amazon Polly.
  • click on S3 synthesis tasks and then Task ID and S3 URL is generated along with Status.
Image 6: Here at Amazon polly it generates a Task ID,Status,Number of characters,Requested date,S3 URL for a text file.
  • Click on S3 service.
Image 7: Go to services and at Storage click on S3.
  • Create a Bucket in S3.
Image 8: Once the Bucket is created with a particular name it will be diaplayed in this manner.
  • mp3 file is stored in S3 Bucket.
Image 9: The converted audio file(.mp3 file) will be saved in this S3 bucket with .mp3 extension.
  • Click on the Object URL.
Image 10: when you click on the file, it displays some features and click on the Object URL.
  • Automatically music player opens and we can listen to Text in our local system.
Image 11:When you click on the URL it will be opens a music player and when you click play button it speaks.

Lambda Code for Sending the Amazon Polly messages to SQS:

import json
import boto3

def lambda_handler(event, context):

client = boto3.client('polly')
polly_response = client.start_speech_synthesis_task(
OutputFormat='mp3',
OutputS3BucketName='polly-demo1',
Text='Hi,How are you',
VoiceId='Joanna'
)

task_id=polly_response['SynthesisTask']['TaskId']
object_url=polly_response['SynthesisTask']['OutputUri']
object_data={'TaskId':task_id,'OutputUri':object_url}
sqs = boto3.client('sqs')
queue_url = 'https://sqs.us-east-1.amazonaws.com/353243667183/speech-text-speech'
sqs_response = sqs.send_message(
QueueUrl=queue_url,
DelaySeconds=0,
MessageBody=json.dumps(object_data)
)
print(sqs_response)


return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

Go to Services and click on Simple Queue Service (SQS)

Image 12: Go to services and at application Integration click on SQS.
  • Create a Queue.
Image 13:Give a Queue name and create a Queue.
  • Your created Queue will be displayed here.
Image 14:The created Queue will be displayed with Queue Name,Queue Type,Messages Avaliable,Massages in Flight and the creation time.
  • Go to Lambda code and Test.

Enable Cron Expression :

This Cron expression is helpful for verifying the message avability in SQS for every 1 minute.
  • When you click on Test button the after conversion of speech the Message is send to SQS.
Image 15:In this we are sending a message to SQS,so that’s the reason Message Avalibility is 1.

Lambda code for receiving messages from SQS:

import json
import boto3

def lambda_handler(event, context):

sqs = boto3.client('sqs')
queue_url = 'https://sqs.us-east-1.amazonaws.com/353243667183/speech-text-speech'
sqs_response = sqs.receive_message(
QueueUrl=queue_url,
MaxNumberOfMessages=3,
)
print (sqs_response)
sqs_data=json.loads(sqs_response["Messages"][0]["Body"])
print(sqs_data)
  • If you go to SQS then message in Flight will be 1 i.e,it is receving message from amazon polly.
Image 16: When we are received message in SQS, then Message in Flight will be displayed as 1.

Lambda code for deleting messages from SQS:

receipt_handle =sqs_response['Messages'][0]['ReceiptHandle']
deleteResponce=sqs.delete_message(QueueUrl=queue_url,ReceiptHandle=receipt_handle)
print('Received and deleted message: %s' % deleteResponce)
  • After receiving message, by writing this deleting code it deletes the message.
Image 17:Once we received then the message will be deleted so that’s the reason both Message Availability and Message in flight is 0.

Taking the mp3 file from S3 bucket and converting into Text using Amazon Transcribe and to store in another S3 bucket.

Lambda code for Taking the mp3 file from S3 bucket and converting into Text using Amazon Transcribe and to store in another S3 bucket:

import json
import boto3

def lambda_handler(event, context):


sqs = boto3.client('sqs')
queue_url = 'https://sqs.us-east-1.amazonaws.com/353243667183/speech-text-speech'
sqs_response = sqs.receive_message(
QueueUrl=queue_url,
MaxNumberOfMessages=3,
)
print (sqs_response)
sqs_data=json.loads(sqs_response["Messages"][0]["Body"])
print(sqs_data)
# # print(json.loads(k["Name"]))




client = boto3.client('transcribe')
response = client.start_transcription_job(
TranscriptionJobName=sqs_data['TaskId'],
LanguageCode='es-US',
MediaFormat='mp3',
Media={
'MediaFileUri':sqs_data['OutputUri']
},
OutputBucketName='transcribebucket123'
)
print("printing transcribe ???????????????????")
print(response)



# # Delete received message from queue
receipt_handle =sqs_response['Messages'][0]['ReceiptHandle']
deleteResponce=sqs.delete_message(QueueUrl=queue_url,ReceiptHandle=receipt_handle)
print('Received and deleted message: %s' % deleteResponce)


return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
  • Go to Services click on Amazon Transcribe.
Image 18: Go to Services and at Machine learning click on Amazon Transcribe.
  • When you click on Test button in Lambda and go to Amazon Transcribe then Transaction Job will be displayed along with Status will be displayed.
Image 19: Transaction job will be displayed with Name,Output Location,Language,status will be in progress.
  • Once the Transaction job is completed it will be displayed in this manner.
Image 20: Transaction job is completed.
  • Go to S3 service and create another Bucket. And then, Text files will be stored in this S3 Bucket.
Image 21:Create another S3 bucket for storing files.
  • Click on the file.
Image 22: The files will be displayed here in this S3 bucket.
  • Click on the URL.
Image 23:when you click on the file, it displays some features and click on the Object URL.
  • Resultant, Text is displayed in this manner.
Image 24:Here the TEXT is displayed with start time,end time,pronunciation,confidence.
  • Go to Properties of this bucket and click on Events.
Image 25: click on Add notification
Image 26:Give your event name,your lambda function name that you wants to be triggered.
Image 27: This automatically triggers to lambda when you upload a file.
  • Create a lambda function for extracting data and to verify the data.

Lambda code for extracting data and to verify the data.

import boto3
import json
import os
import ast
# import base64

def lambda_handler(event, context):
print("printing event ::::::::::::")
print(event)
bucketName=os.environ['bucketname']
client = boto3.client('s3')

object_name=event['Records'][0]['s3']['object']['key']
print(object_name)
responseFromS3 = client.get_object(Bucket=bucketName,Key=object_name)
file_content=responseFromS3['Body'].read()
print("file_content::::::::::",file_content)
new_data=file_content.decode('utf-8')
print(new_data)
print(type(new_data))
valueData = json.loads(new_data)
# print("valueData['results']['transcripts']::")
# print(valueData['results']['transcripts'])
data=valueData['results']['transcripts'][0]['transcript']
print(data)

return data

To verify the extracted data in cloud watch.

Image 28: click on Monitoring to verify the data.
Image 29: Click on View logs in Cloud watch.
Image 30: And your TEXT is displayed here in cloud watch.

Using this solution, at a time we can convert Text-to-Speech and Speech-to-Text and stores this resultant output to a Bucket using aws cloud services.This helps in time optimization.

Ok, that’s it. I hope you found this case useful! See you next time.