How to integrate Google Cloud Text-to-Speech API into your iOS app

Alejandro Cotilla
Google Cloud - Community
3 min readMay 31, 2018

Google Cloud recently launched a new Text-to-Speech API that features over 30 voices, available in multiple languages and variants. The available WaveNet voices produce an extremely natural and fluent sound, but even the “Basic” alternatives sound surprisingly good. You can read all about it here and you can even try it out.

Why use Google’s Cloud Text-to-Speech service?

If you’re reading this, you probably already know the answer. The reason is superior sound quality!

Apple’s SDK already offers Text-to-Speech since iOS 7, and it can be used very easily, with just 4 lines of code.

But Apple’s sound quality is very bad in comparison, it does not use Siri’s voice, it uses other robotic-sounding voices. Even the “enhanced” voices, which the user has to download first, still sound pretty bad by today’s standards.

Before getting into the code

As with any Google Cloud API, the API has to be enabled on a project within the Google Cloud Console and all the API calls will be associated to that project. To setup a project in the Google Cloud Console, you can follow all the steps described here, except that this demo app requires an API key instead of a service account key.

Summarized steps:
1. Create a project (or use an existing one) in the Cloud Console.
2. Make sure that billing is enabled for your project.
3. Enable the Text-to-Speech API.
4. Create an API key.

Now onto the fun part…THE CODE 🙌

We will be creating a simple demo app with basic input controls.
Our app will have:
1. Text view — to enter the text that we want to convert to audio.
2. Segmented controls — to switch between the different voice options.
3. Speak button — to start the speech service.

Download the starter project from here, uncompress and open the project. After running it, you should see something like this:

Now let’s add the SpeechService class to our project. This class has all you need to communicate with the Google Cloud Text-to-Speech API. Each important piece in this file has comments explaining its purpose, make sure you go through them.

Before continuing, make sure you replace <YOUR_API_KEY> with your actual API key, created in the API enabling steps above.

The way you interface with the SpeechService class is as simple as:

SpeechService.shared.speak(text: “My text”) {
// Finished speaking
}

Now, finally, let’s use the SpeechService class on our “Speak” button press action. For that, we need to update the didPressSpeakButton function as follows:

Now, run the app and press on “Speak”. Did it start speaking? YAY!!!
We just converted the TextView text into audio and played it.

So, what’s next? As you probably noticed, our UI has other options that we are not applying. To be able to switch between the different voice categories and genders, and to also disable the “Speak” button while speaking, we need to update the didPressSpeakButton function again:

Run the app one last time. Try the different voice options (category and gender). Did it work? NICE!!! Now you have your own personal J.A.R.V.I.S.!…hmm, maybe not quite yet.

The complete project is available here.

Conclusion

Google Cloud Text-to-Speech API is very easy to use and integrate, and it’s quite capable with impressive audio results. But obviously, all those nice features don’t come for free, you can check the pricing details here, which I think are reasonable. If the audio quality is not a priority in your app, definitely use the built-in AVSpeechSynthesizer by Apple, super easy to use and free.

That’s all for now, thanks for reading! See you next time!

--

--

Alejandro Cotilla
Google Cloud - Community

Love tinkering with new technologies and building enjoyable user experiences.