Text to speech and speech to text synthesizer using Swift

2 min readApr 25, 2023

Text to speech ( TTS)

In Swift programming, you can work with text-to-speech (TTS) and speech-to-text (STT) using the AVFoundation framework, which provides the AVSpeechSynthesizer and AVSpeechRecognizer classes.

Example of AVSpeechSynthesizer to convert speak a text:

import AVFoundation

let synthesizer = AVSpeechSynthesizer()

let utterance = AVSpeechUtterance(string: "Hello, world!")
utterance.voice = AVSpeechSynthesisVoice(language: "en-US")
utterance.rate = 0.5

synthesizer.speak(utterance)

This will use the default voice for the specified language (in this case, English (United States)) to speak the text “Hello, world!” at a rate of 0.5.

The built-in speech synthesizer is capable of speaking multiple languages such as Chinese, Japanese and French. To tell the synthesizer the language to speak, you have to pass the correct language code when creating the instance of AVSpeechSynthesisVoice.

To find out all the language codes that the device supports, you can call up the speechVoices() method of AVSpeechSynthesisVoice:

let voices = AVSpeechSynthesisVoice.speechVoices()
 
for voice in voices {
    print(voice.language)
}

Here are some of the supported language codes:

Japanese — ja-JP
Korean — ko-KR
French — fr-FR
Italian — it-IT
Cantonese — zh-HK
Mandarin — zh-TW
Putonghua — zh-CN

If you need to interrupt the speech synthesizer. You can usestopSpeaking method to stop the synthesizer:

speechSynthesizer.stopSpeaking(at: .immediate)

You can also control other aspects of the speech, such as the pitch, volume, and whether to speak asynchronously or synchronously.

utterance.pitchMultiplier = 1.5
utterance.volume = 0.7
utterance.preUtteranceDelay = 0.5

// Speak asynchronously
synthesizer.speak(utterance)

// Speak synchronously
synthesizer.speak(utterance)
synthesizer.pauseSpeaking(at: .word)

Speech-to-text (STT) Recogniser

This code sets up an SFSpeechRecognizer object with the English (United States) locale, creates a recognition request, and starts a recognition task. When the task completes and the resulting text is printed to the console.

import AVFoundation
import Speech

let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
let request = SFSpeechRecognitionRequest()

recognizer?.recognitionTask(with: request, resultHandler: { (result, error) in
    if let error = error {
        print(error.localizedDescription)
        return
    }
    
    guard let result = result else { return }
    
    print(result.bestTranscription.formattedString)
})

Speech recognition requires user permission and may not be available in all regions or languages and make sure to handle errors and edge cases appropriately.

Text to speech and speech to text synthesizer using Swift

Text to speech ( TTS)

Speech-to-text (STT) Recogniser

Written by Nayana N P