Create a Speech Synthesis System with gTTS

Jyoti Dabass, Ph.D
2 min readNov 22, 2023

--

In this blog, we will learn how to create a speech synthesis system using Google Text-to-Speech (gTTS) in Python. The gTTS library is an easy-to-use tool that converts text to speech and allows you to save the output as an audio file. This can be useful for various applications such as developing voice assistants, audiobooks, and more.

Step 1: Install Libraries

!pip install gtts
from gtts import gTTS #we need it to create an audio files
import os #we need it to save the audio files
import IPython #we need it to listen the audio files

Step 2: Define the text that we want to convert audio file

text_file = "Generative AI is a form of artificial intelligence in which algorithms automatically produce content in the form of text, images, audio and video.Generative AI covers a range of machine learning and deep learning techniques, includingTransformer models: Transformers are neural networks that learn context by identifying and tracking relationships in sequential data, such as words in a sentence. They’re commonly used for natural language processing (NLP) tasks. Transformer architectures now underpin most foundation models.Generative Adversarial Networks (GANs): GANs use two neural networks, a generator and a discriminator. The generator creates new content that it presents to the discriminator, which tries to determine whether it’s real or fake. Over time, the generator learns to create more realistic content that can fool the discriminator, while the discriminator gets better at distinguishing content. Though GANs have famously been used to generate fake videos or images of real people saying or doing things they haven’t done—known as deepfakes—there’s enormous potential for using GAN technology in legitimate business applications, from product design to art and content creation.Variational Autoencoders (VAEs): VAEs learn to generate new content by analyzing patterns in a dataset. They do this by compressing data into a lower-dimensional space and then learning how to generate new data by sampling from this compressed space."
text_file

Step 3: Define audio speed, language and create an object

# our text is in english

audio_file = gTTS(text = text_file, lang = "en", slow = False)
audio_file.save("audio.mp3")
file = "./audio.mp3"
IPython.display.display(IPython.display.Audio(file))

Output is audio file.

Full code is available at Text to speech generation using gTTS Library | Kaggle and jyotidabass/Text-to-speech-generation-using-gTTS-Library (github.com)

Cheers!! Happy reading.

Please upvote if you liked the post!! Thanks!!

--

--

Jyoti Dabass, Ph.D

Researcher and engineer with an interest in data science, analytics, marketing, image analysis, computer vision, fuzzy logic, and natural language processing.