Scraping data from websites to create your own app (for dummies) Part 1

Katherine Wren
Jul 26 · 3 min read

I’ve recently been developing an aggregator app that requires on me having updated data scraped from several sites. I’ve been using Parsehub’s tool to do this as I’m on a reasonably tight schedule and don’t really have time to write a custom scraper for each of the sites I’m accessing. The tool is super simple to use once you’ve got the hang of it — however I’d highly recommend going through the tutorial videos several times BEFORE trying to create your scrape and making notes of the steps you’re going to take, because it’s tricky to leave your project and restart if you need to be reminded half way through.

Once you’ve run a scrape you can access it at the URL

https://www.parsehub.com/api/v2/runs/INSERTYOURRUNTOKEN/data?api_key=INSERTYOURAPIKEY&format=json

I’ve been using JSON to view the data as it’s an easier format for me to use when seeding my database, but you can also choose to view as a CSV file. I’d also highly recommend installing a JSON viewer extension to your browser (I use Chrome), as it makes it SO much easier to see. The one I’m using is very easy to install and works automatically.

Once you have your data online you’ll need to create a database to store it in your application. I’ve been using a rails backend to produce my app’s API. If you’re really reallllllly starting from the beginning, install Ruby, Ruby’s installer RVM and Rails in one command from your terminal by copying and pasting the following command.

\curl -sSL https://get.rvm.io | bash -s stable — rails

This will also install SQLite (the database manager) that you’re going to need for your database, if it isn’t already on your computer. Most Mac systems tend to come with it preinstalled. Then navigate into the folder you want to create your app in and run

rails new yourappsname

You’ll want to have a code reader installed (VS code is a very good one). In this app the very minimum you’re going to need is

  1. A database to put the data you’re accessing online
  2. A model to describe the ‘class’ of this data
  3. A way for the internet to access the data to add more when it’s updated!

We can do some of this in one go. In your terminal here (I’d recommend adding an integrated terminal to VS code for simplicity) run the command

rails g model tablenamesingular columname1singular:datatype columname2singular:datatype

You can add as many columns as you like here. The different data types can be found here but generally I tend to use integer (for numbers) or leave :datatype out all together for strings (text). This command will create your model and the blueprints for the table in the database under the folder ‘migrations’. To create a table and add the table to it run

rails db:create

rails db:migrate

This will create a schema where you can see your table and make sure it’s correct. If it’s incorrect, you can run

rake db:rollback

and change the migration before running it again. You’ll also need a controller to allow the outside world to send data to your database. Create this by running

rails g controller tablenameplural

In your controller you’ll need to put the following code

And in your config/routes file

In the second part of this I’ll cover putting the data into your database and testing the webhook.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade