Scrape data from web to CSV in minutes

Jawakar
3 min readAug 31, 2021

--

This blog will help you to scrape data from any website and convert that to a CSV file.

I’ve used AutoScraper, which is a smart scraper library that quickly scraps the webpage automatically, it is a lightweight web scraper for Python.

By following the code snippets, you could get your scraped CSV file. Let’s get started!

Firstly let’s install the autoscraper library

! pip install autoscraper
from autoscraper import AutoScraper

Now, let us scrap the data from Amazon and build the amazon scraper. You can scrape any website as you wish, I have showcased scraping Amazon as an example.

We need to get the URL of the page to be scraped. To make it more personalized get your product’s URL. Here I get Alexa’s URL.

Create a wanted_list, this will hold the information to be scraped. I consider the product’s name, price and ratings. You can add whatever you want to scrap. You can alter it in your way by following my code below.

amazon_url = "https://www.amazon.in/s?k=alexa"wanted_list = ['Echo Dot (3rd Gen) - #1 smart speaker brand in India with Alexa (Black)','₹3,499 ','67,789']

We build our web scraper by creating an object for AutoScraper(). Then by using build() and get_result_similar() methods we generate scraped data.

scraper = AutoScraper()
result = scraper.build(amazon_url,wanted_list)
data = scraper.get_result_similar(amazon_url, grouped=True)
print(data)

The scraped data will be in the form of a python dictionary with dynamically assigned keys. Each key will be holding the data from our wanted list. You can find your keys by the following code

key = list(data.keys())
print(key)
#output
#['rule_69qt', 'rule_jt8w', 'rule_lwm8', 'rule_bjvp', 'rule_7lzq']

Let us assign appropriate names for the keys and save our scraper. It is similar to training a model and saving it as a file.

scraper.set_rule_aliases({'rule_69qt':'Title', 'rule_7lzq':'Rating', 'rule_lwm8':'Price'})# keeping the same rules for other queries also
scraper.keep_rules(['rule_69qt','rule_7lzq','rule_lwm8'])
scraper.save('amazon-search')

By doing this, we can use this scraper again to scrap similar data from Amazon with other products by just updating the URL. Yes, it is that easy.

Let us consider this time I want to scrap details of PlayStation. Load your scraper and update the URL and get the result

# specify the path
scraper.load("/content/amazon-search")
result = scraper.get_result_similar("https://www.amazon.in/s?k=playstation", group_by_alias=True)

The result we got is in the form of a dictionary. As we want our dataset in CSV form, we convert the dictionary to a Pandas DataFrame by the following code

import pandas as pddf = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in result.items() ]))# Converting our DataFrame to CSV file
df.to_csv('PlayStation_Data.csv')

The below is the screenshot when I did df.head()

Congrats! for scraping successfully.

You can visit my GitHub by clicking here, where I have built an API to scrape the data.

I hope this blog was insightful for you. You could show your love by liking this blog.

Was this blog helpful for you? then leave me a comment below.

Thank you!

--

--