Nokogiri in 5

This past week I gave a quick lecture on how to set up Nokogiri in 5 minutes. What is Nokogiri you ask? It’s definitely not sushi. It’s a web scraping tool that parses HTML. Basically if you want to take any information from a website, this is the way to do it.

I’ll go over how to set it up with rails but a few great links where I learned to teach myself how to do it.

First and foremost let’s make a new rails app, if you don’t know how to make a rails app, beat for now. Maybe I’ll make a post on that down the road rails

Run this command

new nokogiri_project — database=postgresql — skip-turbolinks

This will get your new nokogiri_project running. Next you’re going to want to cd into nokogiri_project

cd nokogiri_project

Nokogiri is a gem so next up let’s install that

gem install nokogiri

This part easily takes the longest so this is a good spot to go grab the most delicious drink of your choice. Mine is coconut water, unfortunately it’s too cold outside to make that trip worth it.

Next your going to want to go into your app folder and create a new services folder, we can do that from the command line by typing

mkdir app/services

You could probably combine this next step with the one above but this is a how to so hopefully this helps, next create a file names nokogiri_service

touch app/service/nokogiri_service.rb

Sweet, open up your editor, I use atom but you can use whatever is most comfortable for you, so I’ll fire it up with the command…

atom . 

This part is the most copy and paste, if you’ve built a service before this should looks pretty familiar,

require ‘nokogiri’
require ‘open-uri’
class NokogiriService
attr_reader :doc, :doc2, :doc3
  def initialize
    @doc = Nokogiri::HTML(open(“https://www.turing.io/"))
    @doc2 = Nokogiri::HTML(open(“http://ibogaineclinic.com/"))
    @doc3 = Nokogiri::HTML(open(“http://www.livescience.com/21275-       color-red-blue-scientists.html"))
  end
end

You need the require nokogiri and open-uri at the top to get this going, next we create some instance variables, I’ve put in my school’s website and then two cool articles I’ve read that I think are worth sharing. One is about a possible miracle drug and the other is about how we see colors differently. The next step is where we actually get the data.

Let’s fire up a console

rails c

Then type

service = NokogiriService.new

And then...

service.doc.text

Congrats! You just set up your own Nokogiri parser!! All you have to do now is change the website url, read up on your Nokogiri skills and voila!! You are a hacker!!!

To test to make sure you have everything working try these…

service.doc.css(‘h1’).text
=> “We turn great people into outstanding developers.The CommunityProven ResultsDedication to Student SuccessA New Tech IndustryThe Turing DifferenceTestimonialsFinancing 101 Checklist”

The arrow points to the expected outcome, try out doc2 in place of doc3 and see what you get there too!! Any questions or comments please feel free to reach out. But that should be Nokogiri in 5.