The 100th Scraping Tutorial You’ll Probably Read

--

I get it! You are exhausted, frustrated and probably beginner level when it comes to Ruby! I can totally relate. You probably read numerous articles relating to this subject and still felt like it didn’t provide the clarity you wanted or that your exhaustion is getting the best of you so you want something straight to the point and digestible. I understand all of that and hopefully by the end of this article you understand scraping with Nokogiri much better and I left you with all the tools that you need so you can easily remember how to go about it without having to search it up everytime.

This article will be broken down in four different parts:

  1. Before Beginning
  2. Questions you may have
  3. An explanation that’s straight to the point and allows you to grab the basics
  4. An explanation that’s in depth and helps people who are beginner level

Let’s get to it!

Before Beginning

Before beginning make sure Nokogiri is installed.

Questions You May Have:

How Do I Install Nokogiri?

How do I find the correct css selector?

  1. Type the website into google
  2. Right click and click INSPECT
  3. Find this icon

4. Click it and hover over the data you want to scrape. It will show you the name of the selector.

Straight To The Point Version

(feel free to switch the names out to whatever is convenient to you)

Step One

Open Terminal and create a directory named “scrape”.

If you need help, click here and follow steps 1–4

Step Two

Create a file within the directory and name it “scrape.rb”

If you need help, click here and follow step 5

Step Three

Open Visual Studio Code and go to the file “scrape.rb”

Step Four

Require nokogiri and open-uri at the top of the file

Step Five

Type the following into Visual Studio Code

⬇EXAMPLE ⬇

Step Six

Type the following into the terminal:

ruby scrape.rb

After pressing enter, you should see something along the line of this. Don’t worry, that’s supposed to happen.

Now we go onto the next step to get the data we need.

Step Seven

Comment out puts website like so:

Step Eight

In order to grab the text of what we want specifically we have to get the css selector and input it into the code like this:

If you need help finding the correct css selector, click here.

⬇EXAMPLE⬇

AND YOU’RE DONE! After you grabbed your data, you can do whatever you want with it and continue with your project

Detailed Explanation:

(Dont be intimidated by how many steps there are! Also feel free to switch the names out to whatever is convenient to you)

Step One

Open terminal and go to desired directory. To get to a different directory, type:

cd (directory name)

If you don’t know which directory to go to, type

ls

into the directory to see what directories you have. If you are in a directory that you are comfortable with keeping your work in then skip to Step Two.

Step Two

Once you found a directory where you want to keep your work, create a new directory by typing

mkdir scrape

The mkdir command creates a new directory from the terminal

Step Three

To check if the directory was created, type:

ls

This will list all the files in the current directory. Once you see the directory that you created continue to step four.

Step Four

Go into the newly created directory by typing:

cd scrape

The cd command means ‘change directory’

Step Five

Once you’re in the directory, type

touch scrape.rb

the touch command created the file and the .rb extension tells you what language the file will be in. In this case it will be ruby.

Step Six

Open up Visual Studio Code and find and open the file you just created.

Step Seven

Require Nokogiri and Open-URI at the top of the file like this:

Step Eight

Then add the following 2 lines of code like so:

⬇EXAMPLE ⬇

Step Nine

In order to see the output of the website in the Terminal, we have to type

ruby before the file name which is scrape.rb

Type into the terminal

ruby scrape.rb

Step Ten

When you press enter, you will see a lot of data. Don’t panic, it just means you did it correctly.

It should look something like this:

Step Eleven

Comment out puts website like so:

You have to comment out puts website because we want specific data, not the entire website in our terminal so this will allow us to move forward without the big mess.

Step Twelve

In order to grab the text of what we want specifically we have to get the css selector and input it into the code like this:

⬇EXAMPLE⬇

Need help with finding the correct css selector? Click Here

Conclusion

And you are done!!! YAY! I hope that was helpful and i accomplished my goal of making sure this is the last scraping tutorial you have to read! Hope you do well and good luck on everything you are working on :).

If you’re curious to see what I did with Nokogiri then click here.

--

--