Web Scraping Using Google Chrome Extension.
Web scraping is a technique to extract large amount of data from different websites and save it to your local directory as a spreadsheet or CSV format. There are several ways of extracting data from the websites. But Now I wants to show how we can extract data by using chrome extension. So, Let’s start —
First of all add the “Web Scraper — Free Web Scraping” chrome extension from the chrome web store. As I have already added it so it shows remove from chrome. In your Case it shows Add to chrome.
That’s it we have set up all the things. Now we are ready to scrap data from the websites. Suppose ,we wants to scrap data from “amazon.com” on a particular device such as Electronics device. So let’s go —
Step-01: Go to “amazon.com” web page.
Step-02: Now go to inspect element section. or Press (Ctrl+ Shift + I).
After going to the inspect element option you will see “Web Scrapper” option.
Step-03: Then go to “Create new sitemap” and create site map. After creating the site map you will see the following :
Step-04: Give the sitemap name as you want and also give the url of your website you want to scrap.
After click in “ Create Sitemap” you will see the following :
Step-05: Click “Add new selector” option :
Here, you should give all the information.
Id : This is the header of your column. Give name as you want.
Type : Here you can see several options. You can extract the data as Text, link ,image ,table and so on.
Selector : Here ,you can see three options. In “Elements Preview” you can see which elements you select from that websites. In “Data preview” you can see the live data collection which data is collected from that page.
After click in “Select” you can see the following —
note : If you wants to select multiple elements on that page then check the “Multiple Box”.
You can see that I am selecting multiple electronics elements of that site and in the electronics section it cover red border.
After that click on “Done Selecting” option. That’s it. Now we check that data is extracted or not.
Now Click on the “Data Preview” options.
Here , we can see that data is extracted. We select the Type as link that’s why it gives us all the links of the “Electronics ”section.
Now Click “Save Selector” option. And you will get the following:
Here, you can also check the data by click on “Data preview” Or, you can also edit the links and category by click on “Edit”.
Step -06: Now click over the row on category and you will get the following .
Here, you can see at the top left side “_root/category” .That means we are now inside the Category.
Now go to the “Accessories & Supplies” or any section on your page and also click on “Add new selector” option.
Here, you can see the page inside“ Audio & Video Accessories” and we can also see the scraper section as we see in the step-05.
Now give all the info as given in step-05 .
Here we get:
Here, we select multiple elements and give the Id name as products-link.After that click on “ Done selecting” option.
Now click on “Data preview” to check that the data is extracted or not.
We can see that data is extracted.Now click “Save Selector” and you will get the following.
You can also check data preview here as we have done in step-05.
Now click over the Products-link row and you will get the following.
Here, you can see at the top left side “_root/Category/products-link” .That means we are now inside the products-link.
Now go inside one products-link (Go inside any products) and select “Add new Selector” and you will get the following:
Now you can see this is something similar we have done in step-05 and step-06.After finishing that we get :
Here, we have collecting the name of the products.Now click on “Done selecting” and “Save selector”.
You can also check here that the product name is extract or not.Try it yourself by click in “Data preview”.
Then again click on “Add new selector”. Don’t confuse in previously we select over the row but this time we only select “Add new selector”.
After selecting price click “Done selecting” and “Save Selector” and get the following:
For collecting image on the website do the following :
After taking image click “Done selecting” and “Save Selector”.Now click “Add new selector” and take whatever information you needed.After completing you will get the following :
Now click on the sitemap amazon you will see several options
Now if you want to see the tree structure of scraping then just click on the “Sitemap amazon-electronics”>‘Selector graph’.
Here is our selector graph :
We are all set for scraping. Now click on “Sitemap amazon-electronics”.
And there click on “scrape” option.
Remain all the things as it is and just click on “Start scraping”.
After click on “Start scraping” it prompts a browser and automatically refresh and collecting data.
Note : It will take some time for scraping .
After finished scraping there is a notification that scraping is finished.
Now click on “refresh” and you will see your extracted data.
Step-07: Now click on “Sitemap amazon-electronics” and then click “Export as CSV ”.
After doing this web scraping is done. We get the file in our download folder as “amazon-electronics.csv ” .
Here is the output file :
Thanks for read this article. Hope this will help you.