Scraping data from websites to create your own app (for dummies) Part 1
I’ve recently been developing an aggregator app that requires on me having updated data scraped from several sites. I’ve been using Parsehub’s tool to do this as I’m on a reasonably tight schedule and don’t really have time to write a custom scraper for each of the sites I’m accessing. The tool is super simple to use once you’ve got the hang of it — however I’d highly recommend going through the tutorial videos several times BEFORE trying to create your scrape and making notes of the steps you’re going to take, because it’s tricky to leave your project and restart if you need to be reminded half way through.
Once you’ve run a scrape you can access it at the URL
I’ve been using JSON to view the data as it’s an easier format for me to use when seeding my database, but you can also choose to view as a CSV file. I’d also highly recommend installing a JSON viewer extension to your browser (I use Chrome), as it makes it SO much easier to see. The one I’m using is very easy to install and works automatically.
Once you have your data online you’ll need to create a database to store it in your application. I’ve been using a rails backend to produce my app’s API. If you’re really reallllllly starting from the beginning, install Ruby, Ruby’s installer RVM and Rails in one command from your terminal by copying and pasting the following command.
\curl -sSL https://get.rvm.io | bash -s stable — rails
This will also install SQLite (the database manager) that you’re going to need for your database, if it isn’t already on your computer. Most Mac systems tend to come with it preinstalled. Then navigate into the folder you want to create your app in and run
rails new yourappsname
You’ll want to have a code reader installed (VS code is a very good one). In this app the very minimum you’re going to need is
- A database to put the data you’re accessing online
- A model to describe the ‘class’ of this data
- A way for the internet to access the data to add more when it’s updated!
We can do some of this in one go. In your terminal here (I’d recommend adding an integrated terminal to VS code for simplicity) run the command
rails g model tablenamesingular columname1singular:datatype columname2singular:datatype
You can add as many columns as you like here. The different data types can be found here but generally I tend to use integer (for numbers) or leave :datatype out all together for strings (text). This command will create your model and the blueprints for the table in the database under the folder ‘migrations’. To create a table and add the table to it run
This will create a schema where you can see your table and make sure it’s correct. If it’s incorrect, you can run
and change the migration before running it again. You’ll also need a controller to allow the outside world to send data to your database. Create this by running
rails g controller tablenameplural
In your controller you’ll need to put the following code
And in your config/routes file
In the second part of this I’ll cover putting the data into your database and testing the webhook.