Getting Started with Amazon Polly using Node.js

After building a smart pumpkin that speaks, I was hoping for a better Text-to-Speech engine API to be released. Luckily, at AWS Re:Invent 2016, Amazon did just that.

Amazon Polly is a low-cost, easy-to-use Text-To-Speech API with impressive sounding voices. You get 5 million characters per month, for the first 12 months free. You can transcribe ~1600 average emails a month for free.

We can quickly get started with Polly using Node.js.

Download Sample Project

If you check out the AWS Node.js SDK documentation, they provide a sample project to get started with the SDK. We are going to use it as a start for this example as well. Clone it:

$ git clone https://github.com/awslabs/aws-nodejs-sample.git

Configure AWS Keys

We need to configure an AWS key/secret within our AWS account. If you don’t have them, take a look at this documentation. Once obtained, create a file in ~/.aws/credentials (C:\Users\USER_NAME\.aws\credentials for Windows users) with this content:

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

The credential file is what the aws-sdk uses by default to connect to your AWS account. Overall, be careful with these keys. They could do damage in the wrong hands.

Text-to-Speech to an MP3 file

The quickest way to use Text-to-Speech is to make the API request to Amazon Polly and write the contents to a file.

Now we can create a new file called amazon-polly-file.js:

cd aws-nodejs-sample && touch amazon-polly-file.js

Here is the code for amazon-polly-file.js:

You can run it with:

node amazon-polly-file.js

In this example, we are taking the text ‘Hi, my name is @anapfox’, sending it to Polly, and writing the contents to the file.

As you can see, we are setting the VoiceID to Kimberly. You can check out all the valid voices here.

Text-to-Speech to Speaker

For some applications, we want to perform the Text-to-Speech then send that directly to a speaker. We can use a node module called speaker. Speaker is just a writable stream that will play PCM audio to your speakers. Let’s add it to our application:

npm i --save speaker

Now we can create a new file called amazon-polly-speaker.js:

touch amazon-polly-speaker.js

Here is the code for amazon-polly-speaker.js:

You can run it with:

node amazon-polly-speaker.js

In this speaker example, we are doing the same thing but creating a stream from the audio we get back from Polly. Then, sending that stream to the speaker module.

All Done 🤗

Now, you should have a working example of a simple Text-to-Speech application. In Amazon Polly, there is much more to play with like SSML and Lexicons.

If I missed anything, feel free to reach out to me on Twitter.

Like what you read? Give Taron Foxworth a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.