Scraping Data for Digital Marketers .. No Coding Experience Required
When you work as a digital marketer you often have to work with a lot of data from quite a few different sources, some of which will not be easily available for you to export out into a format that you can work with. I for one, love it when I see the export to CSV options in the various tools and services I work with.
Weather you are performing an SEO audit for a client or checking out opportunities hidden within the data on a competitor’s website, scraping is one of those things that will save you a lot of time.
Now some of you might be thinking that extracting data from websites is going to be an expensive process, I have good news for you. It does not have to be, sure there are expensive tools that can do this for you, but if you spend some time and effort you can gather the data you need with little to no cost at all.
A word of warning before you decide to go down this path: Scraping might have legal issues in some scenarios, some sites explicitly states that scraping their web-site is against their terms and conditions. Google is one of them.. go figure. So please make sure you don’t get in to trouble using this.
What is data Scraping?
According to Wikipedia,
Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.
When you load up a webpage & extract any data from the page you are essentially scraping the information. If you automate that process, it becomes the data scraping that we are talking about.
For simple scraping these might be overkill. So today we are going to look at a simple Google Chrome plugin called Scraper, that will allow you to easily extract any set of data from a web page on your browser.
Once you install the plugin, you can access it by right clicking on any element on a web page and choosing the “Scrape similar…”. This will bring up the Scraper console. You can define the XPath or jQuery reference in here to scrape all matching data. Once you get the data, you can easily export it out to Google Docs.
Google Chrome Scraper plugin tutorial
Assuming you have installed the plugin, let us take a closer look at how to go about using the plugin. In this example you want to scrape the list of sites mentioned on this free stock photo websites.
Open up the page in on your browser (Chrome in this case), you can see that the list of free stock photo sites listed out in the sidebar on the left. Right click on the item and select “Scrape similar…”.
Now the Scraper console should open up with the desired data, click on Export to Google Docs… and you will get that data in a Google spreadsheet.
This is one of basic things you can do with the plugin. While this is an easy way to scrape data from the webpage, this plugin has one major limitation: it works more like a screenscraper, which means that it can only fetch the data from the page you launch the scraper from. In many situations, that will be the exact thing you want to do.
Now let us look at some more advance stuff, you need to have some basic HTML understanding for this. We will also be working with XPath, I will give you a quick intro to XPath, that will help you get started.
Basic XPath Tutorial
XPath is a query language for selecting nodes from an XML document, in other words you can use XPath expression to find specific information within an XML structure. Using XPath we can easily tell the scraper to look for specific information from any given webpage and the scraper can collect that data for you.
Now let us take a look at a sample XPath expression :
in layman’s terms this is what it looks like
// — search the entire document
div[@class=”test”] — select all div tag with the class “test”
ul — select all un-ordered list from those selected divs
li — select the 4th list item from all of the selected un-ordered list
a — select all links from the selected list item
Now let us see how we can use custom XPath to fetch the data we need from any webpage.
I hope this tutorial gives you some basic understanding on how this plugin can be used to scrape data from websites. This is only one of the many options available to scrape data from any given webpage.