How to Create a Simple Text-to-Speech Application
If you’ve ever felt bored/tired of reading an Article or felt like you needed someone to help you read aloud the content of a magazine, then you are not alone. I have felt so too! This is the major reason I decided to write this piece to help Web Developers build something very simple to solve this huge problem.
In this article, I’d discuss what a Web API is, and how to use SpeechSynthesis API, to convert Texts to speech in different voices.
According to Wikipedia’s Definition of API: API(Application Programming Interface) is an interface that defines interactions between multiple Software applications. It defines the kinds of requests that can be made, how to make them, the data formats that should be used, conventions to follow, etc. It can also provide extension mechanisms so that users can extend existing functionalities in various ways.
The different types of APIs are Open APIs, Internal APIs, Partner APIs, Composite APIs, RESTFUL, JSON-RPC, XML-RPC, and SOAP.
Web APIs often use machine-based interactions such as REST and SOAP.
Our primary focus in this article is the Web Speech API which belongs to the Web/Browser API while other APIs will be extensively discussed in a different article.
What are Web/Browser APIs?
Browser APIs (or web APIs) are the APIs that are built into the browsers. They are able to expose data from the browser and surrounding computer environment which helps developers to perform complex operations. With these APIs, we can build applications that make use of native features such as Notifications, Vibrations, etc.
Web Speech API
According to MDN , the Web Speech API is a web api that enables you to incorporate voice data into web apps. The Web Speech API has two parts: SpeechSynthesis (Text-to-Speech), and SpeechRecognition (Asynchronous Speech Recognition.)
In this article, we’d be looking at the power of SpeechSynthesis (Text-to-Speech and how to make use of it.
SpeechSynthesis
According to MDN, the SpeechSynthesis
interface of the Web Speech API is the controller interface for the speech service; this can be used to retrieve information about the synthesis voices available on the device, start and pause speech, and other commands besides.
Properties inherited from its parent interface, EventTarget:
SpeechSynthesis.paused
This is a Boolean
that returns true
if the speechSynthetisis object is in a paused state.
SpeechSynthesis.pending
A Boolean
that returns true
if the utterance queue contains as-yet-unspoken utterances.
SpeechSynthesis.speaking
A Boolean
that returns true
if an utterance is currently in the process of being spoken — even if SpeechSynthesis
is in a paused state.
Methods also inherited methods from its parent interface, EventTarget
.
SpeechSynthesis.cancel()
Removes all utterances from the utterance queue.
SpeechSynthesis.getVoices()
Returns a list of SpeechSynthesisVoice
objects representing all the available voices on the current device.
SpeechSynthesis.pause()
Puts the SpeechSynthesis
object into a paused state.
SpeechSynthesis.resume()
Puts the SpeechSynthesis
object into a non-paused state: resumes it if it was already paused.
SpeechSynthesis.speak()
Adds an utterance
to the utterance queue; it will be spoken when any other utterances queued before it has been spoken.
It’s now time to build our Simple Text-to-Speech application
Demo Link: https://mytextspeech.netlify.app/
Step 1:
Create a folder, inside the folder, create an index.html file and paste the code below into it. This snippet below contains the HTML code used in building the layout of our application.
Step 2:
Create a javascript file main.js and paste the code below inside it.
Here, we declare all variables needed in our main.js file. The first variable synth represents the properties available in speechSynthesis
object while the other variables are the DOM elements that will be used to handle events.
Step 3:
Inside the main.js file, copy and paste the code below
The voices
array will contain all voices present in the Browser Web speech API while getVoices
function is created to get all voices available alongside the voice names and the languages used. Finally, the voices are then appended to the select option on our web page. This will enable the user to select their preferred voices.
Step 4:
Inside the main.js file, copy and paste the code below
At this point, we will create a function speak
handles the speaking events, adds a wave background when a voice is speaking, handles the selected voice, sets the rate and pitch of the voice, and also handles error in the process.
Step 5:
Inside the main.js file, copy and paste the code below
Finally, we have our event handles which are responsible for handling all the clickable and selectable events in our application i.e Submitting our text contents, increasing/decreasing the rate and pitch, and changing the voices.
Complete Code
Yaay!😃, we did it. We just created a simple Text-to-speech application that enables users to type or copies words or sentences which is then converted into speech. Now you can comfortably sit, paste your desired texts and relax to enjoy the voice of someone reading those contents 😂.
Conclusion
In my other articles, I explained more about Browser APIs and further discuss different types of Browsers APIs available.
Similar Articles
Further explanation/references
- “Web Speech API,” Mozilla Developer Network
- “Web Speech API Specification: Editor’s Draft,” W3C
- “Speech Synthesis API,” Can I Use? (chart of browser support)
- “Web Apps That Talk: Introduction to the Speech Synthesis API,” Eric Bidelman, Google Developers
- “Speech Synthesis API,” Microsoft Developer (demo for Edge)