Johnny Pi, I am your father — part 3: adding cloud-based speech

3 min readSep 4, 2017

In the previous post, we learned how to control our robot with a joystick connected to an Arduino Yùn.

In this post, I’ll show you how to use Amazon Polly to give the gift of speech to our little friend. If you’re not familiar with this service, don’t worry: it’s dead simple to use.

One API call is all it takes to generate speech from text.

Installing Python dependencies

In order to invoke Polly and play the resulting sound file, we need to install three Python packages on our robot:

boto3, the AWS SDK for Python.
the AWS CLI, because we’re bound to need it at some point :)
pygame, which I found to be the easiest way to play a sound file from Python.

Allowing the robot to subscribe to a new MQTT topic

Just like for movement, all commands will be sent through a dedicated MQTT topic, named JohnnyPi/speak. There’s nothing to provision on the AWS IoT gateway, but we have to update the thing’s IAM policy to allow it to subscribe to this new topic.

Just go to the IAM console, locate the proper policy and add the following statement.

Allowing the robot to invoke Polly

We’ve just taken care of IoT security, now let’s make sure the robot is also allowed to call the Polly API. To do so, we’re going to create a new IAM user with the corresponding policy and use its credentials to authenticate.

Let’s use the AWS CLI for once. No, it’s not complicated: just run these commands on your own machine, not on the robot ;)

Take note of the SECRET_KEY and ACCESS_KEY values. As usual, you have two options to deploy them on the robot:

run ‘aws configure’ and fill in the fields
store them in ~/.aws/credentials

Ok, that’s enough IAM for a day. Let’s start to write some code.

Invoking the Polly API

We need to do three things:

connect to Polly and return a client. Please check this page to find out in which regions Polly is available.
using the client, generate a sound file from a text message: as promised, all it takes is a call to a single API, called synthesize_speech(),
play the sound file thanks to the pygame library. Here I’m using the male UK voice (aka ‘Brian’).

Feel free to experiment with the other 47 voices and 23 languages: ‘aws polly describe-voices’ will give you the full list.

Here’s the corresponding code: nothing clever :) It should also run file on your local machine.

Processing the MQTT messages

Now that we can invoke Polly, it’s time to work on processing the MQTT messages. Here’s what we need to add to our existing server.py code:

Connect to Polly,
Subscribe to the JohnnyPi/speak topic,
Write a callback to handle messages and invoke Polly. We’ll keep it super simple and simply read the payload of the message.

Here’s the updated code snippet.

Testing

We’re done: you can grab the updated code on Github.

In order to test all of this, we need to:

connect a loudspeaker to the audio output of the Pi,
ssh to the robot and start server.py,
publish MQTT messages to the JohnnyPi/speak topic, containing the text we’d like the robot to read.

Using MQTT.fx to ask the robot for an honest unbiased opinion on the quality of my writing

What’s next

In the next post, I’ll take you through integration with Amazon Rekognition. Of course, we’ll combine this with Polly to let the robot tell us what it sees :)

As always, thank you for reading and for your support.

Part 0: a sneak preview

Part 1: moving around

Part 2: the joystick

Part 4: cloud-based vision

Part 5: local image classification with MXNet

Soundtrack to this article: the new Paradise Lost album, “Medusa”. Their best in 20+ years.