Phi Correlation Coefficient in R

Rahardito Dio Prastowo
3 min readFeb 25, 2022

--

Phi Correlation Coefficient is used to measure the strength of the relationship between categorical variable (2 levels) and categorical variable (2 levels)

Characteristics Phi Correlation Coefficient :
- it assigns a value between 0 and 1
- 0 is no correlation between two variable

- Correlation hypothesis :
assumes that there is a correlation between two variable
Ho: There is no correlation between two variable
Ha: There is correlation between two variable

When your p-value is less than or equal to your significance level (0.05), you reject the null hypothesis, then we use the following table to interpret the correlation

Case Study

In this case, we will use the dataset of bookstore

Suppose you work in the Sales Department of a bookstore, your superior asks you to analyze customer reading interest in the types of books available in the store

There is a dataset consisting of several variables including Author, User Rating, Review, Price, Year, Genre and Sales, your team have to know the correlation between the genre of the book and sales performance (good or under) with the Phi Correlation Coefficient method

# load dataset
dataset <- read.csv('BooksSales.csv', stringsAsFactors = T)
# show table
View(dataset)

Show the data type to make sure the type of variable we are going to test, Sales and Genre are categorical variable (2 levels)

# show data type
str(dataset)

Show the data plot to see whether the data forms if there is correlation between two categorical variable

# show plot
library(ggplot2)
ggplot(dataset, aes(x=Sales, fill=Genre)) + geom_bar(position='fill')

We have to create a new table to show the frequency of each variable then with the data in this table we will do the next testing process

# create table of frequency
table <- table(dataset$Genre,dataset$Sales)
table
# correlation test
library(psych)
abs(phi(table))
chisq.test(dataset$Genre,dataset$Sales)

- Correlation Test Result Interpretation :
p-value = 5.967e-08
p-value < 0.05, reject the null hypothesis

Ho: There is no correlation between genre and sales performance
Ha:
There is correlation between genre and sales performance

Based on all the Phi Correlation Coefficient tests that have been carried out, it was found that there is strong correlation (r = 0.24) between genre and sales performance with strong correlation, it’s likely that the fiction books get better sales performance

That’s the application of the Phi Correlation Coefficient Test in R using a simple dataset, hopefully it’s easy to understand by everyone who needs this explanation

https://github.com/RaharditoDio/Hypothesis-Testing/blob/main/Phi%20Correlation%20Coefficient%20Medium.R

--

--