The power of APIs: Convert any Web Text to Audio in Less than 30 Minutes

Published in

API World

6 min readApr 28, 2021

As a programmer, you’re using APIs on a daily basis. The role of an API is to make your life easier, reduce the number of headaches, and save you a lot of time.

Usually, in a web project, you use Payment, Newsletter, Social, and Map APIs. But there’s an unwritten rule for APIs: if you think about it, there’s an API for it.

Are you tired of hearing the horoscope from the radio or from the TV? You can connect to an API and get it from there.

Did you just hack your car’s dashboard and want to add the Bible to it? No worries, there’s an API for that as well.

Do you want to find the gender of a person just by their name? I’ve got you! There’s an API for that.

You get my point — there is an API for anything.

Creating the text-to-speech scraper

On average, adults spend 11 hours a day staring at a screen (computer, phone, tablet). Now that spring is here, I want to spend more time outside. When I go out, I want to combine my outdoor time with learning. What if I can take some interesting Wikipedia articles with me and listen to them while I walk around the noisy city?

This is where the text-to-speech scraper comes in handy.

We will create an app that scrapes an article and converts it into audio in less than 30 minutes and with less than 100 lines of code.

Defining the scope of the software

What we want to accomplish is simple:

Scrape a web page. We will scrape a Wikipedia page through an API.
Extract the content. We need heading tags and paragraphs only.
Clean the content. We remove any links or artifacts that come with the article.
Convert the extracted content to audio. We use an external API that converts the text to MP3.
Store the file locally. What’s the point of converting text to audio if you won’t be able to listen to it on your favorite MP3 player?

Finding the right APIs

I’m a fan of free APIs. I’m also a big fan of API Marketplaces where I can find free APIs. I value free APIs because they allow me to turn an idea into a prototype in a matter of hours instead of days (or even weeks).

For our project, we need two APIs:

1. A web scraping API

The winner is WebScrapingAPI. They offer 1,000 requests each month with the free package, no credit card required. The integration with the API is as simple as sending a GET request.

2. A Text-to-speech API

Sentient offers a text-to-speech microservice. Like WebScrapingAPI, they have a free package that comes with 50 calls each month. The integration is plain simple, and they don’t ask you for a credit card if you want to test the service.

Writing the code

I split the code into five easy-to-understand sections. I’ll talk about each of them in detail:

Bootstrap
Scrape
Extract and parse
Text-to-speech
Store

Bootstrap

Before we write any code, let’s install the dependencies:

npm install cheerio axios

This command will install cheerio, a markup parser with jQuery syntax, and axios, a promise-based HTTP client.

After the dependencies are installed, create the index.js file and paste the following code into it:

const cheerio = require(‘cheerio’);
const axios = require(‘axios’);
const fs = require(‘fs’);// Add the WebScrapingAPI key here
const scraping_api_key = ‘*******************************’;// Add the sentient.io Text to Speech API key
const tts_api_key = ‘*******************’;// Url to scrape and convert to audio
const target_url = ‘https://en.wikipedia.org/wiki/API';const scraping_api_url = `https://api.webscrapingapi.com/v1?api_key=${scraping_api_key}&url=${encodeURIComponent(target_url)}`;(async () => { // Paste the code from the next sections here })();

This code loads the libraries we need (cheerio, axios, fs), defines the API keys for WebScrapingAPI and Sentient Text to Speech API, defines the URL we want to scrape (you can replace it with any Wikipedia article URL), and the WebScrapingAPI API URL.

Don’t forget to create an account with WebScrapingAPI and Sentient and add your API keys.

Scrape

Replace the // Paste the code from the next sections here text with this piece of code:

/**
 * Scrape the content
 */let response;try {
 response = await axios.get(scraping_api_url);
} catch (error) {
 console.log(error);
 process.exit();
}const $ = cheerio.load(response.data);
const $content = $(‘.mw-parser-output’);

The code will use the WebScrapingAPI API to scrape the Wikipedia page. Once the API returns the content, we convert it from plain text to a cheerio instance. To make the parsing easier, we need only the content found in the .mw-parser-output element.

Extract and parse

Paste this code after the one from the Scrape section:

/**
 * Extract and parse the context
 */const text = [];$content.children().each(function(index, child){ const tagName = child.tagName;
 const $child = $(child); let childText = ‘’; // We want to parse only paragraphs and heading tags
 if([‘p’, ‘h2’, ‘h3’].indexOf(child.tagName) === -1) return; // Get the child text by tag type
 childText = tagName === ‘p’ ?
 // Strip the tags, remove the reference numbers and newlines
 $child.text().replace(/(\[.+\])/g, ‘’).replace(/\n/g, ‘’) :
 // Get the heading tag text
 $child.find(‘.mw-headline’).text(); // Skip empty sections
 if([‘Examples’, ‘See also’, ‘References’, ‘Further reading’].indexOf(childText) !== -1) return; // Store the text
 text.push(childText);})

We iterate over the children of the .mw-parser-output element, and we do some parsing:

We skip the element if it’s not a paragraph of a heading tag
We remove newlines (\n) and references (numbers between square brackets)
There are some sections in a Wiki page that come without paragraphs (Examples, See also, References, Further reading). We skip these sections as well.
We store the parsed text in a list.

Text-to-speech

Paste this code after the one from the previous section:

/**
 * Convert the text to audio
 */ // The text-to-speech API has a 2000 characters limit
const textToAudio = text.join(‘\n’).substring(0, 2000);try {
 response = await axios.post(‘https://apis.sentient.io/microservices/voice/ttseng/v0.1/getpredictions', {
 text: textToAudio,
 }, {
 headers: {
 ‘content-type’: ‘application/json’,
 ‘x-api-key’: tts_api_key
 }
 })
} catch(error) {
 console.log(error.message);
 process.exit();
}

We use the Sentient API to convert our text to an MP3 file. Keep in mind the free API comes with a 2,000 characters limit. This should be more than enough for proof of concept.

Store

Insert this piece of code after the Text-to-speech section:

/**
 * Store the audio file
 */// Store the audio file
fs.writeFileSync(`audio-${Date.now()}.mp3`, new Buffer(response.data.audioContent, ‘base64’));// Kill the process
process.exit();

The Sentient API returns a base64 encoded MP3 file. We use a Buffer to convert the string to binary, and we store the MP3 file on the local computer.

Testing it out

Now that the code is in place, let’s hope it works. We can only find out by running it. Use this command to run the code:

node index.js

An MP3 file will show up in your project’s directory. Play it to make sure the data returned by the API is not corrupted. If it plays, it means our project is completed.

APIs let us be more creative with our code

You might be asking yourself, why not use libraries instead of APIs? You could pull this project off with libraries but let’s be realistic: how long would it actually end up taking?

The time you’re spending on reinventing the wheel could be used for something else. Making a scraper is not an easy task. You need to worry about user agents, rotating IPs, proxies, and a lot more. Converting text to speech sounds easy, but it’s not. The most used text-to-speech library in NodeJS has 20 contributors, and it’s been abandoned for the past two years.

You can’t build your next startup venture on libraries that are not actively maintained.

When starting a new project (or venture), the cost of implementation is something that you need to pay attention to, and the best way to cut development time (and cost) is by using APIs.

Maybe it’s time to write smarter software and make your business more profitable. What do you think?