Beyond APIs: Exploring The Basics of Web Scraping with Puppeteer
As a developer, I love working with APIs but I’ve often run into situations where I found it hard to find an API that suits what I want to build. In situations like this, I found it advantageous to programmatically scrape data from websites by spinning up a headless version of Google Chrome and pulling data from specific text values 🤓. I know that in my attempts to sound knowledgeable, I’ve made it sound challenging, but in reality, it’s quite easy 😃.
If you’re interested, please join me as I take a hands-on approach to delve into the basics of web scraping. This will involve activities such as scraping text from HTML and saving scraped images locally. I will also briefly describe how we can schedule an automated task to repeat over a certain period.
In this section, we will scrape data from a landing page for a fictional climate change conference event.
Installing Puppeteer
To begin with, we will run npm init -y
in the terminal to add a package.json
file to the project. Please ensure that Node.js is installed on your computer for this command to work.
With this file in place, we can now install Puppeteer by running npm install pupppeteer
. Please take note: it’s easy to misspell; at least it was for me when I first…