An Introduction to AWS Polly, S3 and PHP.

Published in

Cloud Academy

4 min readAug 31, 2017

AWS Polly is a service from Amazon that turns text into speech. Its very simple to use and produces high quality audio files very quickly. Building voice enabled applications is now really easy and simple. Its great for all types of applications such as games, chatbots and learning. At Easyblog.org we use it for converting comments and other text into audio files for children to listen to. This type of accessibility feature can really help young children, especially those with learning or other challenges in their life.

Lets take a quick look into how you can integrate AWS Polly into a website. We will store the mp3 file in S3 and retrieve it for playback on a website.

We will then take this a step further and integrate it into an html webpage that submits the text to polly via JQuery Ajax and plays the result.

Getting Started

The first thing you will need is the AWS SDK for PHP. Download a copy of the SDK using your preferred method and then integrate it into you app/website. Something like the following:

require_once ‘aws/aws-autoloader.php’;

Next you will need an IAM user, specifically one with access to Polly and S3 and an Access Key ID and Secret Key for that user.

Once your code has access to the AWS SDK and the IAM user is created you are ready to begin.

Your code will need 3 sections for this to work.

A credentials section for authentication
An AWS Polly section to convert the text to speech
An AWS S3 section to save the file to S3 and return the url of the file.

Credential settings code

Both Polly and S3 will require IAM credentials to access the services. Once set you can use $credentials in all your code. This also applies to other AWS services accessed though their API.

Don’t forget to replace XXXXXX with your actual ID and Key.

Polly text to speech code

The next step is to setup Polly. We start by creating a new Polly client.

Set the region to your required region and reference the credentials you have already set.

Next we specify the string we want to convert to speech and some details of that conversion.

Breaking this down, you can see that we set the file type, text to be converted, text type and the voice. There are many other options available but these are the minimum you need for basic conversion.

In this example the text is hard coded but it is simple enough to pass this in as a variable.

Now we just pass these details to Polly.

#Returned audio data $resultData_polly = $result_polly->get(‘AudioStream’)->getContents();

The variable “$resultData_polly” now contains the audio data. You can now do what ever you want with the audio. We will save it to S3 for future use.

Saving the MP3 to S3 and retrieving the URL for playback.

Now that we have the audio of our text we need to save it for play back and future use if needed.

As with Polly you will need to create a new client, once again using the credentials you setup before. You will also need to set the region where you bucket is located.

Next call putObject and pass the audio file to S3 as the body of the S3 object.

Its worth noting at this point that the “Key” is the full path to the file, not just the filename. If you want to save the file into a directory within S3 you will need to pass this as part of the key (in/this/directory/polly.mp3).

$result_s3 = $client_s3->putObject([ ‘Key’ => $filename, ‘ACL’ => ‘public-read’, ‘Body’ => $resultData_polly, ‘Bucket’ => $s3bucket, ‘ContentType’ => ‘audio/mpeg’ ]);

This will return an array containing information on the file saved to S3.

For example you can use ObjectURL to get the URL of the object you just saved and echo it out for users to access or to save into a database etc.

<audio controls>
<source src="<?php echo $result_s3['ObjectURL'] ?>" type="audio/mpeg">
</audio>

Making it a little more useful

Now that we have a functional script, it doesn’t take to much work to turn it into something really usable, we just need some html and javascript.

We will now create a web page that contains a form and some javascript.

The html above is a simple form with a single text input for the text you wish to submit to polly and a button.

The above is some simple javascript that will be called when you submit the form. It takes the forms content and passes it to pollysubmit.php for processing. It expects back the URL of the newly created mp3 file so that it can be played.

No we just need to make a couple of minor changes to our original script.

This line takes the text posted by ajax and assigns it to a variable for use.

$polly_text = $_POST['ptext'];

Alter $client_polly->synthesizeSpeech to use the variable as the text source.

'Text'         => $polly_text,

Echo out the URL for passing back to the webpage.

echo $result_s3['ObjectURL'];

And this all there is to it.

Limitations

Remember there are several limitations when using Polly. If you wish to convert large volumes of text you will need to split it into chunks.

A couple of key limits you should be aware of for websites are:

The size of the input text can be up to 1500 billed characters
The output audio stream (synthesis) is limited to 5 minutes, after which, any remaining speech is cut off.

More limitations can be found here.

Example source code

You can find the full source code to all the examples within my GibHub profile here