Beyond APIs: Exploring The Basics of Web Scraping with Puppeteer

6 min readNov 8, 2023

As a developer, I love working with APIs but I’ve often run into situations where I found it hard to find an API that suits what I want to build. In situations like this, I found it advantageous to programmatically scrape data from websites by spinning up a headless version of Google Chrome and pulling data from specific text values 🤓. I know that in my attempts to sound knowledgeable, I’ve made it sound challenging, but in reality, it’s quite easy 😃.

If you’re interested, please join me as I take a hands-on approach to delve into the basics of web scraping. This will involve activities such as scraping text from HTML and saving scraped images locally. I will also briefly describe how we can schedule an automated task to repeat over a certain period.

An illustration showing the processes involved in web scraping

In this section, we will scrape data from a landing page for a fictional climate change conference event.

Installing Puppeteer

To begin with, we will run npm init -y in the terminal to add a package.json file to the project. Please ensure that Node.js is installed on your computer for this command to work.

With this file in place, we can now install Puppeteer by running npm install pupppeteer. Please take note: it’s easy to misspell; at least it was for me when I first…

Beyond APIs: Exploring The Basics of Web Scraping with Puppeteer

Installing Puppeteer

Written by Roy Jumah