How to use Import.io to scrape data from Amazon
Scraping web data from e-commerce websites is a time-consuming and repetitive process that simply begs for automation. Import.io turns this into an easy task, allowing you to get the data you need in a fraction of the time.
Let’s take a look at an example of how you can scrape data of an Amazon page populated by laptops.
The setup
Getting ready for the task ahead is easy and straightforward.
1. Login to your Import.io account by visiting this link.
2. Navigate to the Extractors tab using the panel on the left.
3. Head to the web page you wish to scrape data from. In our case, that’s a results page in Amazon.co.uk.
4. Copy the full URL of the page.
Scraping data — the automatic process
Now that you have the URL and you are at the right place in Import.io, it’s time to get scraping.
1. Click on the New Extractor button located near the top left of the page.
2. Enter the URL you had previously copied.
3. If the data is behind a log in, click on the toggle to set it up. Otherwise, just enter the URL and click on Go.
At this point, Import.io will load the page and attempt to automatically extract the type of data it thinks you want.
Its success rate will vary depending on various factors such as the website structure and the kind of data at hand. However, its true power comes from the ability to customise the data scraping to your specific requirements.
Scraping data — the customisation
To customise the kind of data Import.io is scraping, change over to the Edit tab by clicking the respective button at the top of the page.
Once there, you will be able to see all of the columns that the scraper has automatically organised. You now have two options: work with the existing columns or start anew by clicking on Start over with empty table.
Regardless of your choice, the processes that follow are quite similar. After clicking on your desired column, click on the data you wish for Import.io to scrape.
After clicking on a couple of similar entries, the scraper’s machine-learning algorithms should understand what you are trying to scrape and popular the rest of the column automatically.
For instance, after clicking on a few price points, the scraper will get all the prices on that page so that you don’t have to click on everything manually.
More advanced queries are also possible, such as feeding the scraper with paginated URLs so that it can scrape data from multiple pages.
Extracting data
Once you are happy with the kind of data you’ve set Import.io to scrape for, click on Extract data from website. At this point, you can also choose whether to run the scraper on a schedule or not, and whether you’d like to be emailed when it has finished running.
Both options are extremely useful in certain situations, such as when you wish to monitor the prices in a certain website, or when you wish to know when certain ratings change.
After clicking on the Save and run button, allow the scraper some time until it processes all of the data. When it’s done, you will be able to download the data in different formats such as CSV, see the scraping log, and preview the data in a basic table before downloading.
And if you wish to run the scraper on the same URL to get similar data again, you can just click on Run Urls to get updated data on the spot.