Introduction to R for Data Science (Part Six)

This is the sixth introduction to R. This will cover tidyr, ggplot2, histograms, scatterplots, and more.

Ivan Huang
3 min readApr 2, 2023

PS: Please read ‘Introduction to R for Data Science (Part Five)’ before reading this one. This is a continued version of part five.

Part five: Introduction to R for Data Science (Part Five)

Pipe Operator

The pipe operator is going to allow us to chain together multiple operations.

The pipe operator is just a way to keep things neater. Instead of writing multiple codes, you can just use the pipe operator. In this example I did filter mpg greater than 30, a sample size is four, and had mpg arranged from highest to lowest. Data has to be first, then %>%, and then your operations.

Using Tidyr

Tidyr is going to help us clean data. We’re going to install tidyr. So in order to install it put these steps into your R console:

  1. install.packages(‘tidyr’)
  2. install.packages(‘data.table’)
  3. library(tidyr)
  4. library(data.table)

In order to clean data, we can use:

  • gather()

gather() is collapsing Qtr1:Qtr4 into key values which are Quarter and Revenue. This is useful if you want to gather stock prices, and/or quarterly sales data. The Qtr1:Qtr4 is grabbing Qtr1,Qtr2,Qtr3,Qtr4. The revenue is just gathering the data from each quarter.

Data Visualization

ggplot2

ggplot2 follows a distinct philosophy that is built on the idea of adding layers to your visualization.

This article will give you a great explanation of layers: https://englelab.gatech.edu/useRguide/introduction-to-ggplot2.html

Histograms

We’re going to install ggplot2. So in order to install it put these steps into your R console:

  1. install.packages(‘ggplot2’)
  2. install.packages(‘ggplot2movies’)
  3. library(ggplot2)
  4. library(ggplot2movies)

Here is a cheat sheet to make it easier to understand: https://static1.squarespace.com/static/584e336fe3df28e18000d637/t/60ef3aae750ba457d74c3a88/1626290862751/data-visualization-2.1.pdf

Note that ggplot2movies is the dataset that we’re going to work on.

This is the basics of a histogram. We get the data from movies then we change the x-axis to rating and then we add geom_histogram() to create the histogram. The geom_histogram() is found in the cheat sheet, so reference your cheat sheet when creating types of graphs.

Bindwidth is changing how wide your bins are.

Color is changing the color.

Fill is filli

--

--