Introduction to R for Data Science (Part Six)

This is the sixth introduction to R. This will cover tidyr, ggplot2, histograms, scatterplots, and more.

Ivan Huang
3 min readApr 2, 2023

PS: Please read ‘Introduction to R for Data Science (Part Five)’ before reading this one. This is a continued version of part five.

Pipe Operator

The pipe operator is going to allow us to chain together multiple operations.

The pipe operator is just a way to keep things neater. Instead of writing multiple codes, you can just use the pipe operator. In this example I did filter mpg greater than 30, a sample size is four, and had mpg arranged from highest to lowest. Data has to be first, then %>%, and then your operations.

Using Tidyr

Tidyr is going to help us clean data. We’re going to install tidyr. So in order to install it put these steps into your R console:

  1. install.packages(‘tidyr’)
  2. install.packages(‘data.table’)
  3. library(tidyr)
  4. library(data.table)

In order to clean data, we can use:

  • gather()

gather() is collapsing Qtr1:Qtr4 into key values which are Quarter and Revenue. This is useful if you want to gather stock prices, and/or quarterly sales data. The Qtr1:Qtr4 is grabbing Qtr1,Qtr2,Qtr3,Qtr4. The revenue is just gathering the data from each quarter.

Data Visualization


ggplot2 follows a distinct philosophy that is built on the idea of adding layers to your visualization.

This article will give you a great explanation of layers:


We’re going to install ggplot2. So in order to install it put these steps into your R console:

  1. install.packages(‘ggplot2’)
  2. install.packages(‘ggplot2movies’)
  3. library(ggplot2)
  4. library(ggplot2movies)

Here is a cheat sheet to make it easier to understand:

Note that ggplot2movies is the dataset that we’re going to work on.

This is the basics of a histogram. We get the data from movies then we change the x-axis to rating and then we add geom_histogram() to create the histogram. The geom_histogram() is found in the cheat sheet, so reference your cheat sheet when creating types of graphs.

Bindwidth is changing how wide your bins are.

Color is changing the color.

