Visualizing Relationships In Data Using R

Syed Hamed Raza
4 min readJul 27, 2022

--

Data visualization is the process of transforming information into a visual form like graphs and maps to make data easier for humans to understand and pull insights from it. The main goal of data visualization is to make it easier to identify patterns, trends, and outliers in large data sets.

In this section, we will see how to draw

  • Scatter Plot
  • Bar Plot
  • Pie Plot
  • Histogram

Scatter Plot

A scatter plot is used to display the values of two quantitative variables in the form of dots in a 2-D plane. We want to visualize House Price and Square Footage data using a scatter plot and draw a line of best fill from where we can predict the value of an unknown house given the Square Footage area of the house.

sqft <- c(40,30,20,10,50,12,14,25,26,24,60,31,42,45,50)
price <- c(4000,2900,1700,1200,4800,1500,2200,2900,
3200,2100,5500,3850,3900,5800,4500)
plot(sqft, price, xlab = "Size (Sqft)", ylab = "Price (USD$)",
main = "House Price vs. Square Footage")
abline(lm(price ~ sqft))
head(iris, n = 5)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 ...
plot(iris$Petal.Length, iris$Petal.Width, xlab = "Petal Length", ylab = "Petal Width",
main = "Petal Width vs. Petal Length")
abline(lm(iris$Petal.Width ~ iris$Petal.Length ))

Bar Plot

Bar Plot presents categorical data with rectangular bars where the heights or lengths are proportional to the values they represent.

users <- c(200, 400, 300, 100, 50)
progLang <- c('Java', 'R', 'Python', 'C++', 'Other')
barplot(users,
names.arg = progLang,
xlab = "Programming Language",
ylab = "Number of Users",
main = "Number of Users for Various Programming Languages")

To draw the barplot using ggplot2 we need to import the ggplot2 library which is a part of the tidyverse package. We will follow the steps given below

  • import tidyverse package
  • show first 5 rows of chickwts dataset
  • display structure of chickwts dataset
  • make a barplot of the chickwts dataset in which feed is on the x-axis and weight on the y-axis.
library(tidyverse)head(chickwts, n = 5)
## weight feed
## 1 179 horsebean
## 2 160 horsebean
## 3 136 horsebean
## 4 227 horsebean
## 5 217 horsebean
str(chickwts)
## 'data.frame': 71 obs. of 2 variables:
## $ weight: num 179 160 136 227 217 168 108 124 143 140 ...
## $ feed : Factor w/ 6 levels "casein","horsebean",..: 2 2 2 2 2 2 2 2 2 2 ...
chickwts %>%
ggplot(aes(x = feed, y = weight)) +
geom_col()

Pie Chart

A pie chart presents data in a circular graphic which is divided into slices to illustrate numerical proportion.

users <- c(200, 400, 300, 100, 50)
progLang <- c('Java', 'R', 'Python', 'C++', 'Other')
pie(users,
labels = progLang,
main = "Number of Users for Various Programming Languages")
ggplot(chickwts, aes(x = "", y = weight, fill = feed)) +
geom_col() +
coord_polar(theta = "y")

Histogram

A histogram represents the distribution of numeric data graphically.

grades <- c(51,53,64,67,68,71,73,76,78,79,81,85,88,91,95)
hist(grades, breaks =5)
hist(iris$Petal.Length, breaks =10)

Conclusion

In this article, we discover relationships within data using graphs such as scatter plots, bar plots, pie plots, and histograms.

--

--

Syed Hamed Raza

Master's degree in Computer Applied Technology from Huazhong University, Wuhan, China. Expert in ML, DL, Computer Vision, NLP. Passionate mentor and innovator.