using nokogiri to scrape wikipedia and saving it to your database

Zach Cusimano
loriscode
Published in
3 min readFeb 26, 2017

How to use nokogiri to scrape wikipedia and then save it to your database.

Using a website scraper to utilize data for analysis or use in your application is a powerful tool.

In this example I wanted to scrape all of the names on wikipedia’s national parks page and then save them into my database for use in a campsite rating app.

First I needed to find the best place to anchor my web scraper in order to grab the correct information from the page. This can be done by inspecting the documents CSS elements. I can quickly determine the path of the element is located at “tr th a”.

using developer tools in Chrome

From here I pass the URL into nokogiri and give it the CSS path I’m interested in parsing out. I input the names into an array and then save them into my campsite database through find_or_create_by which also ensures only unique items are saved.

scraping like crazy

Then make the new data available in the campsite#index view

campsites controller index method

Save into database through the campsite schema.

campsites database table

Finally display on page through campsites#index

campsites#index view
campsites#index route
campsite#index site view
campsite#show site view

Now that you have control of the data the possibilities of what you can do with it are endless.

// using Rails 5 with the Nokogiri gemfile. If you have any questions feel free to comment.

--

--