Photo by Fotis Fotopoulos on Unsplash

How to Create a Simple Text-to-Speech Application

nonsodaniel
Geek Culture
Published in
4 min readMay 7, 2021

--

If you’ve ever felt bored/tired of reading an Article or felt like you needed someone to help you read aloud the content of a magazine, then you are not alone. I have felt so too! This is the major reason I decided to write this piece to help Web Developers build something very simple to solve this huge problem.

In this article, I’d discuss what a Web API is, and how to use SpeechSynthesis API, to convert Texts to speech in different voices.

According to Wikipedia’s Definition of API: API(Application Programming Interface) is an interface that defines interactions between multiple Software applications. It defines the kinds of requests that can be made, how to make them, the data formats that should be used, conventions to follow, etc. It can also provide extension mechanisms so that users can extend existing functionalities in various ways.

The different types of APIs are Open APIs, Internal APIs, Partner APIs, Composite APIs, RESTFUL, JSON-RPC, XML-RPC, and SOAP.

Web APIs often use machine-based interactions such as REST and SOAP.

Our primary focus in this article is the Web Speech API which belongs to the Web/Browser API while other APIs will be extensively discussed in a different article.

What are Web/Browser APIs?

Browser APIs (or web APIs) are the APIs that are built into the browsers. They are able to expose data from the browser and surrounding computer environment which helps developers to perform complex operations. With these APIs, we can build applications that make use of native features such as Notifications, Vibrations, etc.

Web Speech API

According to MDN , the Web Speech API is a web api that enables you to incorporate voice data into web apps. The Web Speech API has two parts: SpeechSynthesis (Text-to-Speech), and SpeechRecognition (Asynchronous Speech Recognition.)

In this article, we’d be looking at the power of SpeechSynthesis (Text-to-Speech and how to make use of it.

SpeechSynthesis

According to MDN, the SpeechSynthesis interface of the Web Speech API is the controller interface for the speech service; this can be used to retrieve information about the synthesis voices available on the device, start and pause speech, and other commands besides.

Properties inherited from its parent interface, EventTarget:

SpeechSynthesis.pausedThis is a Boolean that returns true if the speechSynthetisis object is in a paused state.

SpeechSynthesis.pending A Boolean that returns true if the utterance queue contains as-yet-unspoken utterances.

SpeechSynthesis.speaking A Boolean that returns true if an utterance is currently in the process of being spoken — even if SpeechSynthesis is in a paused state.

Methods also inherited methods from its parent interface, EventTarget.

SpeechSynthesis.cancel()Removes all utterances from the utterance queue.

SpeechSynthesis.getVoices()Returns a list of SpeechSynthesisVoice objects representing all the available voices on the current device.

SpeechSynthesis.pause()Puts the SpeechSynthesis object into a paused state.

SpeechSynthesis.resume()Puts the SpeechSynthesis object into a non-paused state: resumes it if it was already paused.

SpeechSynthesis.speak()Adds an utterance to the utterance queue; it will be spoken when any other utterances queued before it has been spoken.

It’s now time to build our Simple Text-to-Speech application

Demo Link: https://mytextspeech.netlify.app/

Final result

Step 1:

Create a folder, inside the folder, create an index.html file and paste the code below into it. This snippet below contains the HTML code used in building the layout of our application.

index.html

Step 2:

Create a javascript file main.js and paste the code below inside it.

main.js

Here, we declare all variables needed in our main.js file. The first variable synth represents the properties available in speechSynthesis object while the other variables are the DOM elements that will be used to handle events.

Step 3:

Inside the main.js file, copy and paste the code below

The voices array will contain all voices present in the Browser Web speech API while getVoices function is created to get all voices available alongside the voice names and the languages used. Finally, the voices are then appended to the select option on our web page. This will enable the user to select their preferred voices.

Step 4:

Inside the main.js file, copy and paste the code below

At this point, we will create a function speak handles the speaking events, adds a wave background when a voice is speaking, handles the selected voice, sets the rate and pitch of the voice, and also handles error in the process.

Step 5:

Inside the main.js file, copy and paste the code below

Finally, we have our event handles which are responsible for handling all the clickable and selectable events in our application i.e Submitting our text contents, increasing/decreasing the rate and pitch, and changing the voices.

Complete Code

Complete index.html code
Complete main.js code

Yaay!😃, we did it. We just created a simple Text-to-speech application that enables users to type or copies words or sentences which is then converted into speech. Now you can comfortably sit, paste your desired texts and relax to enjoy the voice of someone reading those contents 😂.

Conclusion

In my other articles, I explained more about Browser APIs and further discuss different types of Browsers APIs available.

Similar Articles

Further explanation/references

--

--