Basic text-to-speech in iOS

In the process of creating one of my side projects, I realized that I wanted to utilize text-to-speech in order to have the application read out words that appeared on the screen. I searched for a bit to try to find a way to implement this, thinking I’d have to use an API; however, to my surprise, Apple has a library called AVFoundation that contains some fairly robust text-to-speech capabilities. AVFoundation also has a great deal of other audio functionality, so it’s worth checking out if you’re unfamiliar with it.

Making your application read text aloud is very straightforward and only requires a few lines of code. First, make sure you import <AVFoundation/AVFoundation.h>. Then it’s as simple as initializing an AVSpeechSynthesizer, initializing an AVSpeechUtterance with a string, and finally calling speakUtterance on your AVSpeechSynthesizer with your AVSpeechUtterance as an argument.

Great! You should now be able to run your application and hear the cold, detached, robotic voice of Siri speaking your sample string.

AVSpeechUtterance has many more features; two that I decided to look at are rate and pitchMultiplier. These are pretty self explanatory: rate determines how quickly the utterance is spoken while pitchMultiplier determines the pitch of the voice used. Let’s build a quick interface to mess around with these two attributes. Create a UITextField, two UISliders with labels, and a UIButton:

Use ctrl+drag to connect the text field and sliders to properties in the header file and the button to a method in the implementation. In viewDidLoad, set the minimum and maximum values for the sliders (pitchMultiplier has minimum and maximum values of 0.5 and 2.o while rate has minimum and maximum values of AVSpeechUtteranceMinimumSpeechRate and AVSpeechUtteranceMaximumSpeechRate (woo Apple!). The synthesizer should also be initialized here.

When the button is tapped, the utterance property’s sentence should be set to the content of the text field, and its pitchMultiplier and rate should be set to the values of the sliders. speakUtterance should also be called on the synthesizer with the utterance as the argument:

We’ve succeeded in making Siri sound even more ridiculous than she already does!

There’s a lot more that you can do with text-to-speech using synthesizers and utterances. NSHipster has a great article on AVSpeechSynthesizer including a list of languages that can be used and other cool features. Beyond that, there are some APIs that have more functionality and features like more natural sounding voices (iSpeech being a good example). I’m looking forward to working with text-to-speech more in later projects.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.