The 100th Scraping Tutorial You’ll Probably Read
I get it! You are exhausted, frustrated and probably beginner level when it comes to Ruby! I can totally relate. You probably read numerous articles relating to this subject and still felt like it didn’t provide the clarity you wanted or that your exhaustion is getting the best of you so you want something straight to the point and digestible. I understand all of that and hopefully by the end of this article you understand scraping with Nokogiri much better and I left you with all the tools that you need so you can easily remember how to go about it without having to search it up everytime.
This article will be broken down in four different parts:
- Before Beginning
- Questions you may have
- An explanation that’s straight to the point and allows you to grab the basics
- An explanation that’s in depth and helps people who are beginner level
Let’s get to it!
Before Beginning
Before beginning make sure Nokogiri is installed.
Questions You May Have:
How Do I Install Nokogiri?
- To install Nokogiri, click here.
How do I find the correct css selector?
- Type the website into google
- Right click and click INSPECT
- Find this icon
4. Click it and hover over the data you want to scrape. It will show you the name of the selector.
Straight To The Point Version
(feel free to switch the names out to whatever is convenient to you)
Step One
Open Terminal and create a directory named “scrape”.
If you need help, click here and follow steps 1–4
Step Two
Create a file within the directory and name it “scrape.rb”
If you need help, click here and follow step 5
Step Three
Open Visual Studio Code and go to the file “scrape.rb”
Step Four
Require nokogiri and open-uri at the top of the file
Step Five
Type the following into Visual Studio Code
⬇EXAMPLE ⬇
Step Six
Type the following into the terminal:
ruby scrape.rb
After pressing enter, you should see something along the line of this. Don’t worry, that’s supposed to happen.
Now we go onto the next step to get the data we need.
Step Seven
Comment out puts website like so:
Step Eight
In order to grab the text of what we want specifically we have to get the css selector and input it into the code like this:
If you need help finding the correct css selector, click here.
⬇EXAMPLE⬇
AND YOU’RE DONE! After you grabbed your data, you can do whatever you want with it and continue with your project
Detailed Explanation:
(Dont be intimidated by how many steps there are! Also feel free to switch the names out to whatever is convenient to you)
Step One
Open terminal and go to desired directory. To get to a different directory, type:
cd (directory name)
If you don’t know which directory to go to, type
ls
into the directory to see what directories you have. If you are in a directory that you are comfortable with keeping your work in then skip to Step Two.
Step Two
Once you found a directory where you want to keep your work, create a new directory by typing
mkdir scrape
The mkdir
command creates a new directory from the terminal
Step Three
To check if the directory was created, type:
ls
This will list all the files in the current directory. Once you see the directory that you created continue to step four.
Step Four
Go into the newly created directory by typing:
cd scrape
The cd
command means ‘change directory’
Step Five
Once you’re in the directory, type
touch scrape.rb
the touch
command created the file and the .rb
extension tells you what language the file will be in. In this case it will be ruby.
Step Six
Open up Visual Studio Code and find and open the file you just created.
Step Seven
Require Nokogiri and Open-URI at the top of the file like this:
Step Eight
Then add the following 2 lines of code like so:
⬇EXAMPLE ⬇
Step Nine
In order to see the output of the website in the Terminal, we have to type
ruby
before the file name which is scrape.rb
Type into the terminal
ruby scrape.rb
Step Ten
When you press enter, you will see a lot of data. Don’t panic, it just means you did it correctly.
It should look something like this:
Step Eleven
Comment out puts website like so:
You have to comment out puts website because we want specific data, not the entire website in our terminal so this will allow us to move forward without the big mess.
Step Twelve
In order to grab the text of what we want specifically we have to get the css selector and input it into the code like this:
⬇EXAMPLE⬇
Conclusion
And you are done!!! YAY! I hope that was helpful and i accomplished my goal of making sure this is the last scraping tutorial you have to read! Hope you do well and good luck on everything you are working on :).
If you’re curious to see what I did with Nokogiri then click here.