Text-to-speech converter using AWS Lambda, Polly, and API Gateway

Lucas Soldeira Ludicsa
4 min readMar 7, 2024

--

API Gateway, Lambda, and Polly

In today’s digital age, Text-to-Speech (TTS) converters play a vital role in enhancing accessibility, efficiency, and inclusivity across various platforms by transforming written text into spoken words.

To create a TTS using AWS, we’ll need the following steps:

  • Create an IAM role that gives your Lambda function, permission to use Amazon Polly.
  • Create the lambda function that will handle the text-to-speech conversion.
  • Create an API Gateway to act as the entry point to the serverless application.
  • Deploy and test API.

Step 1: Creating IAM role:

  1. Go to the AWS IAM console.
  2. Click on “Roles” in the left sidebar.
  3. Click on “Create role”.
  4. For the service that will use this role, choose Lambda.
  5. Attach the “AmazonPollyFullAccess” policy (or you can create a custom policy with just the permissions needed).
  6. Finish creating the role.

Step 2: Create the Lambda Function

  1. Go to the AWS Lambda console: Lambda Console.
  2. Click on “Create function”.
  3. Choose “Author from scratch”.
  4. Fill in the details:
  • Name: TextToSpeechConverter
  • Runtime: Python 3.x (or language of your choice)
  • Role: Choose an existing role and select the role you created earlier.

Now update the Lambda code source with the following code:

import json
import boto3
import base64

def lambda_handler(event, context):
try:
# Get text to convert from request
text = event['queryStringParameters']['text']

# Initialize Polly client
polly = boto3.client('polly')

# Synthesize speech
response = polly.synthesize_speech(
Text=text,
OutputFormat='mp3',
VoiceId='Joanna'
)

# Return audio as base64 encoded string
audio_stream = response['AudioStream'].read()
audio_base64 = base64.b64encode(audio_stream).decode('utf-8')

return {
'statusCode': 200,
'headers': {
'Content-Type': 'audio/mpeg',
},
'body': audio_base64,
'isBase64Encoded': True
}
except Exception as e:
return {
'statusCode': 500,
'body': json.dumps(str(e))
}

Click on Deploy to finish deploying the function.

Step 3: Create API Gateway

  1. Go to the API Gateway console: API Gateway Console.
  2. Choose “HTTP API”.
  3. Click on “Build”.
  4. Configure your API:
  • Add integration type: “Lambda”
  • Give the API a name TextToSpeechConverterAPI
  • Lambda Function: Select the Lambda function you created (TextToSpeechConverter)

5. In “Routes” configure the resource path to /TextToSpeechConverter.

6. Configure integration target to TextToSpeechConverter.

7. Give a stage name. Stages are independently configurable environments that your API can be deployed to. Here we’re going to use test.

8. Review and create the API Gateway.

Step 4: Test your API

  1. Test the API using curl command on linux
  2. You’ll need the API ID, which you’ll find in the left section of API Gateway → APIs
  3. Go to your terminal and send a GET request to your endpoint, here’s an example:
curl -X GET "https://dmobfl29v8.execute-api.us-east-1.amazonaws.com/test/TextToSpeechConverter?text=hello,%20this%20is%20a%20tutorial%20test!" --output output.mp3
  • -X GET: Specifies that this is a GET request.
  • "https://dmobfl29v8.execute-api.us-east-1.amazonaws.com/test/TextToSpeechConverter?text=hello,%20this%20is%20a%20tutorial%20test!": This is the API endpoint with API ID, stage name, and region. It includes the text query parameter with the value "hello, this is a tutorial test!".
  • --output output.mp3: This option specifies that the output of the request (the audio file) should be saved to a file named output.mp3 in the current directory.

Now, it’s time to listen to the audio!

That’s it, I hope you liked and have a good one!

--

--

Lucas Soldeira Ludicsa

Being able to write about my projects and interests, is being able to practice and study while showing services and possibilities in the cloud.