Dynamic Web Scraping with Puppeteer Node.js

Anto Haryanto
2 min readApr 14, 2020

--

Sometimes websites have dynamic content, created with javascript or WebAssembly, in this situation, we cannot use html parser, we need to really open a web browser like chrome, and run it with headless mode. There are many node modules to do it like selenium, panthomjs, and puppeteer.

puppeteer uses the google chrome devtool protocol to control the browser automatically.

To install puppeteer we can use this command:

npm i puppeteer

I will use Puppeteer to retrieve item names from the Walmart search page. In principle I only inject javascript through the puppeteer and return the results to the nodejs variable.

Before creating the javascript code to be injected, I need to open the target site in a web browser to search for a unique mark on the html tag. which can be used to select html with the function document.querySelectorAll, document.getElementById, or others. You do this by right clicking and opening inspect element

In each item we can see that there is a search-result-gridview-item that we can use.

Because in this tutorial I will only take the name of the item, then I look for the title element, and I find it like this:

Alright now it’s ready and we can type the code:

We will run this code with commands like this:

node main.js

And the result is like this:

--

--