Introduction to R for Data Science (Part Six)

This is the sixth introduction to R. This will cover tidyr, ggplot2, histograms, scatterplots, and more.

3 min readApr 2, 2023

PS: Please read ‘Introduction to R for Data Science (Part Five)’ before reading this one. This is a continued version of part five.

Part five: Introduction to R for Data Science (Part Five)

Pipe Operator

The pipe operator is going to allow us to chain together multiple operations.

The pipe operator is just a way to keep things neater. Instead of writing multiple codes, you can just use the pipe operator. In this example I did filter mpg greater than 30, a sample size is four, and had mpg arranged from highest to lowest. Data has to be first, then %>%, and then your operations.

Using Tidyr

Tidyr is going to help us clean data. We’re going to install tidyr. So in order to install it put these steps into your R console:

install.packages(‘tidyr’)
install.packages(‘data.table’)
library(tidyr)
library(data.table)

In order to clean data, we can use:

gather()

gather() is collapsing Qtr1:Qtr4 into key values which are Quarter and Revenue. This is useful if you want to gather stock prices, and/or quarterly sales data. The Qtr1:Qtr4 is grabbing Qtr1,Qtr2,Qtr3,Qtr4. The revenue is just gathering the data from each quarter.

Data Visualization

ggplot2

ggplot2 follows a distinct philosophy that is built on the idea of adding layers to your visualization.

This article will give you a great explanation of layers: https://englelab.gatech.edu/useRguide/introduction-to-ggplot2.html

Histograms

We’re going to install ggplot2. So in order to install it put these steps into your R console:

install.packages(‘ggplot2’)
install.packages(‘ggplot2movies’)
library(ggplot2)
library(ggplot2movies)

Here is a cheat sheet to make it easier to understand: https://static1.squarespace.com/static/584e336fe3df28e18000d637/t/60ef3aae750ba457d74c3a88/1626290862751/data-visualization-2.1.pdf

Note that ggplot2movies is the dataset that we’re going to work on.

This is the basics of a histogram. We get the data from movies then we change the x-axis to rating and then we add geom_histogram() to create the histogram. The geom_histogram() is found in the cheat sheet, so reference your cheat sheet when creating types of graphs.