Working with TTS on Houndify

Daniel Core
Houndify
4 min readJul 14, 2021

--

In this tutorial, we’ll be looking at a simple demonstration of how to configure the sample client in Python to send a query and decode the TTS audio from Houndify. Houndify provides TTS audio for all requests that return a spoken audio response across a variety of domains. This tutorial is based on the Python SDK, but the concept is similar in the other SDKs. At the end of this tutorial, you will be able to hear all about the weather at the South Pole as read by Mia from Australia.

Getting Started

If you don’t already have an account, head over to Houndify to get set up. To get started, download the Python SDK from the client SDKs page. Before moving on to configuring the SDK, you’ll need to have a TTS domain enabled on your client. First, navigate to the Domains page and select the Voices Category from the category dropdown displaying all the available TTS domains. If you don’t have TTS domains available, you will need to upgrade to a paid account.

Houndify offers a few choices for TTS domains, including ReadSpeaker, Acapela, and Selvy. The TTS domains provide support for many languages with a variety of voices. In this example, we’re using the “Mia” voice from the ReadSpeaker domain. The different TTS domains all return the audio in the same format, so feel free to use whichever TTS provider that works best for you.

After enabling the TTS domain, you will also want to enable a destination domain for your queries. For this tutorial, we’re enabling the Weather domain. After saving your client, you should see the three domains below enabled.

Configuring Python Sample Client

The Python SDK has a few sample clients that demonstrate different features. The sample_text.py shows how to connect and interact with Houndify through text. For the purposes of this tutorial, we’re going to use this sample code as a starting place where we’ll then configure it to return the correct response. The following fields need to be set for the client to return the “ResponseAudioBytes” field in the response JSON. The documentation for your chosen TTS domain contains the available voices as well as audio samples to test out the different options. The “ResponseAudioShortOrLong” field will set whether the audio returned is from the “SpokenResponse” or the “SpokenResponseLong” field. In this case, we will use the “Short” version.

requestInfo = {
'ResponseAudioVoice' : 'Mia',
'ResponseAudioShortOrLong' : 'Short',
'ResponseAudioEncoding' : 'WAV'
}
client = houndify.TextHoundClient(CLIENT_ID, CLIENT_KEY, "test_user", requestInfo)

Setting the ResponseAudioEncoding field tells the client which encoding should be returned. Setting to WAV will return a wav version while setting to Speex will return it in the compressed Speex format which will reduce bandwidth usage.

requestInfo = {
'ResponseAudioVoice' : 'Mia',
'ResponseAudioShortOrLong' : 'Short',
'ResponseAudioEncoding' : 'Speex'
}

Sending Requests

After configuring the request info, we can send the query to Houndify using the following:

response = client.query("Whats the weather at the South Pole")

This handy query lets us know what the current weather is at the South Pole.

Decoding The Audio

TTS audio responses are returned in the response JSON along with the other fields. The audio is in the “ResponseAudioBytes” field base64 encoded. To reach that field, you will need to access it within the response JSON as shown below. In this example, you’ll need to get the first index of the AllResults array to retrieve the ResponseAudioBytes string inside:

decode_string = base64.b64decode(response['AllResults'][0]['ResponseAudioBytes'])

Once you have the decoded bytes, they can be written out to a file or used directly for playback.

Playing the Audio

Playsound offers a simple way to playback the file with Python. To install use the following:

pip install playsoundpip install AppKit

Note:

  • On OSX you may need to rename appkit directory in your install location from lower case to AppKit.

Add the following import to use playsound in the sample client

from playsound import playsound

After the decode step, you can write to a file and play that with playsound

wav_file = open("output.wav", "wb")
wav_file.write(decode_string)
playsound('output.wav')

If successful you should hear the current weather at the South Pole as read by Mia from Australia.

This process will get you started on your journey with Houndify and is the same for voice queries. With the flexibility of different SDKs, voices, and domains, you’re sure to find a combination that fits your use case. We look forward to seeing what you can do with the Houndify platform.

--

--