Pearson Correlation in R

Rahardito Dio Prastowo
3 min readFeb 20, 2022

--

Correlation analysis an inferential statistics method used to measure the strength or degree of the relationship between two variables move in relation to each other, correlation measures association but doesn’t means if A causes B or vice versa

There are several types of correlation analysis methods as shown in the table, the use of these methods depends on the variables to be tested. Specifically for numerical relationships there are Pearson Correlation and Spearman Correlation

The Pearson Correlation measures the strength and direction of the linear relationship between two numerical variables

Characteristics Pearson Correlation :
- it assigns a value between − 1 and 1
- 0 is no correlation
- 1 is total positive correlation, — 1 is total negative correlation

Hypothesis Testing :

- Correlation hypothesis :
assumes that there is a linear relationship between two variables
Ho: There is no linear relationship between two variables
Ha: There is linear relationship between two variables

When your p-value is less than or equal to your significance level (0.05), you reject the null hypothesis, then we use the following table to interpret the correlation

Case Study

In this case, we will use the dataset of house price

Suppose you work on the marketing department at property company, there is a house price dataset consisting of several variables including house price , area, house age, swimming pool, garage, and university location
In order to find out the correlation between house prices and other parameters, your team have to find out if there is a correlation between house prices and total area with Pearson Correlation

# load dataset
dataset <- read.csv('House Price.csv', stringsAsFactors = T)
# show table
View(dataset)

Show the data type to make sure the type of variable we are going to test (Price & Area) is numerical variable

# show data type
str(dataset)

Show the data plot to see whether the data forms is a linear pattern

# show plot
library(ggplot2)
ggplot(dataset, aes(x=Area, y=Price)) + geom_point()

Show the pearson correlation value for the correlation between two groups

 # correlation test
cor.test(dataset$Price,dataset$Area, method="pearson")

Correlation Test Result Interpretation :
p-value = 2.2e-16
p-value < 0.05, reject the null hypothesis

Ho: There is no linear relationship between house price and total area
Ha: There is linear relationship between house price and total area

Rs = 0.5946
The correlation between two variables is positive normal correlation

Based on all the Pearson Correlation tests that have been carried out, it was found that there is a linear relationship between house prices and total area and the correlation is positive, the larger the area, the higher the house price

That’s the application of the Pearson Correlation Test in R using a simple dataset, hopefully it’s easy to understand by everyone who needs this explanation

https://github.com/RaharditoDio/Hypothesis-Testing/blob/main/Pearson%20Correlation%20Medium.R

--

--