Simple Movie Recommender System with Correlation Coefficient with Python

Ashwin Prasad
Analytics Vidhya
Published in
3 min readOct 22, 2020

Introduction

Recommender systems are the systems that are designed to recommend things to the user based on many different factors

Pearson’s Correlation Coefficient is a very simple yet effective way to find how 1 variable linearly changes with respect to another. we can use this to our advantage and build a recommender system with this concept

Working of correlation coefficient

If correlation coefficient is closer to 1 for two variables, these variables are directly proportional to each other.

If it is closer to -1 , these variables are inversely proportional to each other.

If the magnitude of the correlation coefficient is lower or closer to 0, the variables are probably don’t have a strong dedpendency with respect to each other.

Python Code :

  1. Importing the necessary libraries.
#importing the libraries
import numpy as np
import pandas as pd

2. Dataset (MovieLens Dataset)
for the purpose of implementing recommender systems, I have used the movielens dataset which contains the ratings for 100k movies

#data import
df1 = pd.read_csv('./ml-100k/u.data',sep='\t',names=['user_id','item_id','rating','timestamp'])
df2 = pd.read_csv("./ml-100k/u.item", sep="|", encoding="iso-8859-1",names=["item_id","item_name","date","unknown1"
"website","rat1","rat2","rat3","rat4","rat5","rat6","rat7","rat8","rat9","rat10","rat11","rat12","rat13",
"rat14","rat15","rat16","rat17","rat18","rat19","rat20"])
print(df1.head())

output:

The dataframe1 contains the user id , the movie id and the corresponding ratings

df2 = df2.iloc[:,0:2]
df2.head()

output:

The dataframe2 contains the movie name and it’s corresponding item_id

3. Merging the dataframes

data = df1.merge(df2,on="item_id")
data.drop(['timestamp'],inplace=True,axis=1)
data.head()

Merging the dataframe 1 to dataframe 2 to get the entire dataset

4. Pivot Table

data_table = pd.pivot_table(data,values='rating',columns='item_name',index='user_id')
data_table.head()

output:

We utilize the Pivot Table from pandas create a table with each movie representing a column and each user representing a row

5. Start Recommending

That’s it for this basic recommender systems, inorder to make predictions , we are going to get a movie name from the user and give a list of movies that the user might like. This is where the correlation coefficient comes into play

Let’s assume that the user liked the movie 101 Dalmatians (1996). we have to give a list of movies that we think the user might like.

print("here are a list of 20 movies to recommend to a user who has liked '101 Dalmatians (1996)'")print(data_table.corr()['101 Dalmatians (1996)'].sort_values(ascending=False).iloc[:20])

output:

So, This is how we can use the pearson’s correlation coefficient to recommend movies to users based on the movies they liked

Conclusion

We use the pivot table and correlation coefficient to recommend movies here. If the user likes a particular movie, we take that movie’s columns and find the correlation of that column with all the other movie columns and get the movies that highly correlate with the chosen movie.
This works because, the rows represent users and a particular user might like similar movies. Hence, we can use correlation coefficient to recommend movies to the users.

The code file will be available on github for reference : https://github.com/ashwinhprasad/RecommenderSystem-CorrelationCoefficient

Thank You

--

--

Ashwin Prasad
Analytics Vidhya

I write about things that intrigue me on any field of Computer Science, with more weightage to Machine Learning and Systems Programming