Using Web Scraping to Solve Problems and Create Value
This past year I’ve used web scraping as a tool in different software projects, and have really loved the possibilities it brings to the table — So much so that I’ve had the chance to develop products that revolve around it, using it as a main source of value. In this article, I’ll talk a bit about my experience and how you can leverage web scraping to add value to your projects.
For the uninitiated, web scraping is basically how you can analyze a web page’s source code to retrieve some data of interest. There’s a way to do it in practically every language, and there are nice tools and libraries available that make this process really easy. I won’t go into the technical details of how to do this because there’s a lot of resources out there for the different languages you may be interested in learning about, but a simple google search for “web scraping in X language” should suffice.
A brief introduction
Let’s say you want to display conversion rates for different currencies in your website, and no public API exists that you can use to get these rates. You could go to your local bank’s site, look at the rates they’re buying or selling for and then update your database or website accordingly, but let’s be honest… Ain’t nobody got time for that. Furthermore, you’d have to do this every morning to ensure that the rates for the day are correct.
Obviously, there’s a better way. Web scraping allows you to make a program that scans that website for changes and automatically updates your database with the relevant information.
Practical example: Gas Prices app
In my country, the prices for different fuel types are released by the government every Friday to be applied the following week. I saw myself having to search for the website every Friday morning to check if the prices were going up or down, so I decided I’d make it easier on myself by making an app that would notify me when the prices changed, and I could easily check if they went up or down.
This is a pretty simple example of a way you can use web scraping to solve a problem. I need a way to get the current and previous prices to display them in the app and constantly check them to send notifications if they’ve changed. To achieve this, I simply set up a script to run every hour on a server to handle the notifications, and implemented scraping on the client app as well to load the latest information when the user opens the app.
All we’re doing here is practically relaying the information from the website directly to our app and notification server. Nothing too interesting, but we can start to see how having web scraping as a tool in our arsenal could be valuable to develop new ideas.
Web scraping is really powerful when rather than using the scraped data outright, you use it to make something that wasn’t possible before.
In my university, lots of students use empty classrooms as a quiet place to study and do homework since the library fills up pretty quickly. However, to see which rooms aren’t holding classes at the moment, the student needs to go room by room, building by building until they find a suitable room. We identified a need for an app that could easily show you which rooms are available for use. Right away, I thought of web scraping as a way to get the data we needed.
The university publishes a list of classes containing the schedule for different sections, along with the room and building they’re in.
To get available rooms, all we had to do was make a program that scrapes this page and then use this data to figure out when a particular room was empty. I could explain more about how this was done, but I think you get the idea.
Since we had information about all the classes in a particular room, we could also display a schedule for any given room, which was something you could see posted on each room’s door, but you couldn’t see online or anywhere else.
This is a case where we were able to use scraped data to create an entirely new product that’s not a substitute of the scraped one, but rather something that gives you extremely valuable information that you couldn’t get any other way.
It’s still not a very complex use case, but you can easily see how powerful this tool can be when you use it to make things that weren’t possible before.
Scraping as an API alternative
Nine times out of ten, if something has an API available you’ll want to use it over making a scraper. However, scraping is really useful as a way to programatically access data provided by other services when they don’t have a public API available.
I’ll be writing a post soon about the legal gray-area web scraping is in, as well as some experience I’ve had dealing with the “scraped” party not wanting to be scraped, so stay tuned for that.
Thanks for reading, and if anything I just wanted to introduce the concept of web scraping and tried to get you to think in ways that you can leverage this to make new things. Would love to know your thoughts in the comments below! :)