Introduction to R for Data Science (Part Six)
This is the sixth introduction to R. This will cover tidyr, ggplot2, histograms, scatterplots, and more.
PS: Please read ‘Introduction to R for Data Science (Part Five)’ before reading this one. This is a continued version of part five.
Pipe Operator
The pipe operator is going to allow us to chain together multiple operations.
The pipe operator is just a way to keep things neater. Instead of writing multiple codes, you can just use the pipe operator. In this example I did filter mpg greater than 30, a sample size is four, and had mpg arranged from highest to lowest. Data has to be first, then %>%, and then your operations.
Using Tidyr
Tidyr is going to help us clean data. We’re going to install tidyr. So in order to install it put these steps into your R console:
- install.packages(‘tidyr’)
- install.packages(‘data.table’)
- library(tidyr)
- library(data.table)
In order to clean data, we can use:
- gather()
gather() is collapsing Qtr1:Qtr4 into key values which are Quarter and Revenue. This is useful if you want to gather stock prices, and/or quarterly sales data. The Qtr1:Qtr4 is grabbing Qtr1,Qtr2,Qtr3,Qtr4. The revenue is just gathering the data from each quarter.
Data Visualization
ggplot2
ggplot2 follows a distinct philosophy that is built on the idea of adding layers to your visualization.
This article will give you a great explanation of layers: https://englelab.gatech.edu/useRguide/introduction-to-ggplot2.html
Histograms
We’re going to install ggplot2. So in order to install it put these steps into your R console:
- install.packages(‘ggplot2’)
- install.packages(‘ggplot2movies’)
- library(ggplot2)
- library(ggplot2movies)
Here is a cheat sheet to make it easier to understand: https://static1.squarespace.com/static/584e336fe3df28e18000d637/t/60ef3aae750ba457d74c3a88/1626290862751/data-visualization-2.1.pdf
Note that ggplot2movies is the dataset that we’re going to work on.
This is the basics of a histogram. We get the data from movies then we change the x-axis to rating and then we add geom_histogram() to create the histogram. The geom_histogram() is found in the cheat sheet, so reference your cheat sheet when creating types of graphs.
Bindwidth is changing how wide your bins are.
Color is changing the color.
Fill is filli
Read the full article here: https://ivanh.substack.com/p/introduction-to-r-for-machine-learning-28d