using nokogiri to scrape wikipedia and saving it to your database
How to use nokogiri to scrape wikipedia and then save it to your database.
Using a website scraper to utilize data for analysis or use in your application is a powerful tool.
In this example I wanted to scrape all of the names on wikipedia’s national parks page and then save them into my database for use in a campsite rating app.
First I needed to find the best place to anchor my web scraper in order to grab the correct information from the page. This can be done by inspecting the documents CSS elements. I can quickly determine the path of the element is located at “tr th a”.
From here I pass the URL into nokogiri and give it the CSS path I’m interested in parsing out. I input the names into an array and then save them into my campsite database through find_or_create_by which also ensures only unique items are saved.
Then make the new data available in the campsite#index view
Save into database through the campsite schema.
Finally display on page through campsites#index
Now that you have control of the data the possibilities of what you can do with it are endless.
// using Rails 5 with the Nokogiri gemfile. If you have any questions feel free to comment.