How to create a Text To Speech API using `pyttsx3` library?
Have you ever felt bored reading a huge chunk of texts — hoping that someone can read the content to you aloud or being able to save the texts as audio file so that you can playback anytime?
In this article, i will briefly guide you on how you can create your own API to convert text to speech using the Python Library pyttsx3. We will be exploring 2 different methods.
- Converting text to speech real time
- Taking in a text and converting it to an audio file
Pyttsx module in python is used to convert text to speech, unlike other libraries it works offline. Firstly, we will need to install this module and the following requirements to create a Flask API.
You can read more on flask implementation over here.
pip3 install -r requirements.txt
Now we have the environment setup, we will start to create our first endpoint.
Converting Text To Speech Real Time
Let’s break down the code.
First, we had to import all the relevant modules for both flask and pyttsx3. Next, we will create a
/text-to-speech. When this file is run you can post a request to access
http://0.0.0.0/text-to-speech and get a real time voice response.
This endpoint will take in a JSON request body as shown above. In our code, we will first validate the request and do a simple string processing for the text. To optimise the conversion from text to audio we will have to remove punctuation and space from the text.
Next, we have a helper function
set_up to initialise the text to speech engine and configure the speed and volume of the text. After setting up the engine, we run the
say method which will play the audio when this endpoint is run.
This endpoint will return a JSON response. During the processing of the text, if an error occured it will return a status code 500 with the relevant error message.
Taking in a text and convert it to an audio file
The second method is slightly more complex as we will be returning an audio file back as a response instead of playing the text real time.
We will be using AWS S3 to store the audio file.
An AWS Account with your Access Key information. Refer to this guide to set up your Access Key.
We will be adding this set of code to the above
Let’s break down the code into 2 portion.
Processing of Text
We will create a
/text-to-speech/audio-file. When this file is run you can post a request to access
http://0.0.0.0/text-to-speech/audio-file and a response with a S3 url that will access your audio file.
This endpoint will take in a JSON request body as shown above. The text processing is the same as what we did earlier in the real-time conversion setup. The only difference is the method use here, we will be using the
save_to_file(text, "filename") method instead of the
You can see from the code that the filename is hardcoded with
audio.mp3. When running the
save_to_file() method, it will save an audio file in your current directory. But to further enhance on this implementation, we want store this audio file somewhere secure and easily accessible just by an URL. By hardcoding the audio name, S3 upload function can easily reference the file. Next, we will go into detail on how we use AWS SDK for python Boto3 to upload and retrieve the audio file.
Before uploading the audio file, we will have to set up the SDK boto3 configuration. This is where we will create the boto3.client with our access_key and secret_key.
Next we will create the
upload_data function that takes in the following parameters file_name (audio.mps), object_name (filename in the request body) and bucket_name. Within the function, we will call the s3 upload file api. You will be able to see the audio file uploaded to your s3 bucket in the
After uploading the audio file, you will want to retrieve and access the file. We have a function
get_presigned_s3_url that takes in bucket_name and file_name as parameters. This function will return an URL to the audio file which expires in 60 minutes (Configurable).
Something to take note here is that the file_name is the filename pass from the request body.
"message": "You have successfully process the text...",
So far you have learned how we can use the python library to create text-to-speech API with additional functions. I had deployed a sample swagger documentation with all the endpoints. You can test all the endpoints by accessing the documentation here. Do comment below or reach out to me via LinkedIn if you need clarification on the implementation.