How to Use AWS Polly in R — basic tutorial

Jarabek Gergely
4 min readNov 19, 2018

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Amazon Polly is a Text-to-Speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.

Amazon Polly is a great opportunity for any application developpers to bring their products to live. This is why it is also a good opportunity for business analysts, so they can give a voice for their otherly dry analytics. For them, using Polly can be a colorful tool to bring a presentation or a shiny app to live.

For a business analysts, it’s a key feature to use Polly through R, being one of the most used tools for them.

Polly is avaible through CRAN, the main library container for R, however the latest version can only be accesed via github (click for information on fixes here). For an install with ease, I would suggest to use devtools package for a github installation:

install.packages (“devtools”)
library(devtools)
install_github (“cloudyr/aws.polly”)
library(aws.polly)
and follow R studio instructions from there.

To use the package, you will need an AWS account and enter your credentials into R. Your keypair can be generated on the IAM Management Console under the heading Access Keys. Note that you only have access to your secret key once. After it is generated, you need to save it in a secure location. New keypairs can be generated at any time if yours has been lost, stolen, or forgotten.

By default, all cloudyr packages look for the access key ID and secret access key in environment variables. You can also use this to specify a default region or a temporary “session token”. For example:

Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
"AWS_SECRET_ACCESS_KEY" = "mysecretkey",
"AWS_DEFAULT_REGION" = "us-east-1",
"AWS_SESSION_TOKEN" = "mytoken")

The Polly R package uses 4 different functions. The pollyHTTP executes a full Polly API request:

Usage

pollyHTTP(action, query = list(), headers = list(), body = NULL,
verb = c(“GET”, “POST”, “PUT”, “DELETE”), version = “v1”,
raw_response = if (verb == “POST”) TRUE else FALSE,
verbose = getOption(“verbose”, FALSE),
region = Sys.getenv(“AWS_DEFAULT_REGION”, “us-east-1”), key = NULL,
secret = NULL, session_token = NULL, …)

Arguments

action — A character string specifying the API action to take
query — An optional named list containing query string parameters and their character values.
headers — A list of headers to pass to the HTTP request.
body — A request body
verb — A character string specifying the HTTP verb to implement.
version — A character string specifying the API version.
raw_response — A logical indicating whether to return the raw response body.
verbose — A logical indicating whether to be verbose. Default is given by options(“verbose”).
region — A character string specifying an AWS region. See locate_credentials.
key — A character string specifying an AWS Access Key. See locate_credentials.
secret — A character string specifying an AWS Secret Key. See locate_credentials.
session_token — Optionally, a character string specifying an AWS temporary Session Token to use in signing a request. See locate_credentials.
… — Additional arguments passed to GET.

However, the basic use of the package is super simple and revolves around the synthesize() function, which takes a character string and a voice as input. To know what kind of voices are available, use the following function:

# list available voices
list_voices()

that gives the following output:

##   Gender       Id LanguageCode LanguageName     Name
## 1 Female Joanna en-US US English Joanna
## 2 Female Salli en-US US English Salli
## 3 Female Kimberly en-US US English Kimberly
## 4 Female Kendra en-US US English Kendra
## 5 Male Justin en-US US English Justin
## 6 Male Joey en-US US English Joey
## 7 Female Ivy en-US US English Ivy

As a sample to use synthesize() funciton:

# synthesize some text
vec <- synthesize("Hello world!", voice = "Joanna")

The result is a "Wave" object (from the tuneR package), which can be played using play():

library("tuneR")
play(vec)

With R, we can also manage Polly Lexicons. Pronunciation lexicons enable you to customize the pronunciation of words. Amazon Polly provides API operations that you can use to store lexicons in an AWS region. Those lexicons are then specific to that particular region. You can use one or more of the lexicons from that region when synthesizing the text by using the SynthesizeSpeech operation. This applies the specified lexicon to the input text before the synthesis begins.

We have 4 fucntions to manipulate Polly lexicons through R instead of the handy AWS UI to list, get, put a new, and to delete an existing lexicon:

list_lexicon()
get_lexicon(lexicon, token, …)
put_lexicon(lexicon, content, …)
delete_lexicon(lexicon, …)

where:

item{lexicon} is acharacter string specifying the name of a lexicon. If missing, a list of available lexicons is returned.}

item{token} is optional, a pagination token.}

item{…} are additional arguments passed to \code{\link{pollyHTTP}}.} item{content} is a character string containing the content of the PLS lexicon.

For sources & more information, check out via.

via(AWS Polly Documentation, aws.Polly on github)

--

--