Google Web Scraper Tutorial

Luxi Wang
5 min readJul 17, 2019

--

Web Scraper is an automate data extraction tool, which makes web data extraction easy and accessible for everyone. You can install it from Chrome store and add it to your developer tools. Web scraper can handle sites with pagination links, popup links, AJAX pagination links, “Load more” buttons, scrolling the page, etc.

This tutorial will introduce the tool and guide you to apply the extension in potentially applicable data extraction situations. If you have any question or problem using it, please feel free to let me know @Luxi.

What is Web Scraper?

Scraper is a useful extension from Chrome store. It has following features:

  • Point and Click Interface
  • No coding required. Configure scraper by simply pointing and clicking on elements;
  • Extract Data from Dynamic Website
  • Navigate a website on all levels: Categories and subcategories, Pagination &Product pages;
  • Built for the Modern Web
  • Full JavaScript execution (better for JavaScript frameworks websites), pagination handlers, page scroll down, waiting for AJAX request, etc.;
  • Modular Selector System
  • Allow you to build site maps from different types of selectors and tailor data extraction to different site structures;
  • Export data in CSV
  • Export data in CSV format directly from your browser;

Installation

  • You can install the extension from Chrome store and add it to your developer tools.
  • After installing it you should restart chrome to make sure the extension is fully loaded.
  • Web Scraper is integrated into chrome Developer tools. Figure 1 shows how you can open it.

​​

Scraping a Site

As I said, Web Scraper can accurately recognize and handle complex sites with multiple levels, pagination and popup links. You just need to follow the steps, simply point and click to create a structured sitemap, and the tool will do the scraping for you.

Now, let’s start with a simple case.

Here, we are asked to collect the following information of one specific hospital on this webpage (Dingxiang):

  • Hospital‘s name, address, phone number, & introduction
  • Hospital’s expertise;
  • List of departments with hyperlinks of specific physicians;

Step 1. Create Sitemap

The first thing you need to do when creating a sitemap is specifying the start url. This is the url from which the scraping will start. You can also specify multiple start urls if the scraping should start from multiple places.

For example : use range url like this http://example.com/page/[1-3]

Step 2. Create Selectors

After you have created the sitemap you can add selectors to it. In the Selectors panel you can add new selectors, modify them and navigate the selector tree. The selectors can be added in a tree type structure.

Then,

Same processes for the address, phone number, and expertise.

However, to collect the hyperlinks for each department, we need to change the selector type to Element Click. Element click selector can interact with the web page by clicking on buttons to load new elements. For example a page might use JavaScript and AJAX for pagination or item loading (configuration options).

Now, back to the sitemap, we have five selectors. We can check the data preview to confirm whether the information is scraped accurately. To collect the hyperlinks, we need to click the last selector and add a new sub-selector.

Firstly, change the selector’s type. Then if you click on select, the yellow highlight area will appear so you can extract the information from this unit area.

Here are the links we collected. The tool automatically click on the next page button and scrape all links.

If we are still not satisfied with the links we collected, and we want to enter the hyper link page and scrape more data, we can enter the hyperlink and take the same actions;

​​

Step 3. Scrape & Export

The following figure is the Selector Graph Tree produced by our sitemap train1. Since our sample case is very simple, the selector tree doesn’t have many branches or leaf nodes.

Here is another tree graph example I created for our project. It looks very concise but it can scrape all provinces’ hospital information, including the departments’ data and specific doctors’ profiles. All you need to do click on scrape and wait.

You can export the following output in csv version.

Let’s see the output for another sitemap I created above. It scraped 50000+ physician’s profile data during one night.

Selector Type

Web scraper has multiple selectors that can be used for different type data extraction and for different interaction with the website. The selectors can be divided in three groups:

  • Data extraction selectors for data extraction.
  • Link selectors for site navigation.
  • Element selectors for element selection that separate multiple records

Data extraction selectors

Data extraction selectors simply return data from the selected element. For example Text selector extracts text from selected element. These selectors can be used as data extraction selectors:

Video Tutorials

Here are some video tutorials that lead you to scrape e-commerce sites.

Check it out and let me know if you have any question.

--

--