Nokogiri in 5
This past week I gave a quick lecture on how to set up Nokogiri in 5 minutes. What is Nokogiri you ask? It’s definitely not sushi. It’s a web scraping tool that parses HTML. Basically if you want to take any information from a website, this is the way to do it.
I’ll go over how to set it up with rails but a few great links where I learned to teach myself how to do it.
The Nokogiri gem is a fantastic library that serves virtually all of our HTML scraping needs. Once you have it…ruby.bastardsbook.com
First and foremost let’s make a new rails app, if you don’t know how to make a rails app, beat for now. Maybe I’ll make a post on that down the road rails
Run this command
new nokogiri_project — database=postgresql — skip-turbolinks
This will get your new nokogiri_project running. Next you’re going to want to cd into nokogiri_project
Nokogiri is a gem so next up let’s install that
gem install nokogiri
This part easily takes the longest so this is a good spot to go grab the most delicious drink of your choice. Mine is coconut water, unfortunately it’s too cold outside to make that trip worth it.
Next your going to want to go into your app folder and create a new services folder, we can do that from the command line by typing
You could probably combine this next step with the one above but this is a how to so hopefully this helps, next create a file names nokogiri_service
Sweet, open up your editor, I use atom but you can use whatever is most comfortable for you, so I’ll fire it up with the command…
This part is the most copy and paste, if you’ve built a service before this should looks pretty familiar,
attr_reader :doc, :doc2, :doc3
@doc = Nokogiri::HTML(open(“https://www.turing.io/"))
@doc2 = Nokogiri::HTML(open(“http://ibogaineclinic.com/"))
@doc3 = Nokogiri::HTML(open(“http://www.livescience.com/21275- color-red-blue-scientists.html"))
You need the require nokogiri and open-uri at the top to get this going, next we create some instance variables, I’ve put in my school’s website and then two cool articles I’ve read that I think are worth sharing. One is about a possible miracle drug and the other is about how we see colors differently. The next step is where we actually get the data.
Let’s fire up a console
service = NokogiriService.new
Congrats! You just set up your own Nokogiri parser!! All you have to do now is change the website url, read up on your Nokogiri skills and voila!! You are a hacker!!!
To test to make sure you have everything working try these…
=> “We turn great people into outstanding developers.The CommunityProven ResultsDedication to Student SuccessA New Tech IndustryThe Turing DifferenceTestimonialsFinancing 101 Checklist”
The arrow points to the expected outcome, try out doc2 in place of doc3 and see what you get there too!! Any questions or comments please feel free to reach out. But that should be Nokogiri in 5.