Recommendation Systems: Collaborative Filtering just with numpy and pandas, A-Z

Santhosh
6 min readApr 30, 2018

lets start with “Why” Recommendations?

Recommendation systems are every where, for instance Netflix, youtube, Spotify, e.t.c use recommendation systems to generate better results for users content. These recommendation will help user to get most related content.

Without any further due lets dive right into the recommendation systems & how to build them using a simple technique (Collaborative Filtering)

What are different types of Recommendation systems?

  1. Content Based
  2. Collaborative Filtering
    1. User-User
    2. Item-Item
  3. Matrix Factorization
  4. Deep Learning (Neural Networks)

In this story we will be focusing on Collaborative Filtering (I will let you know why later, keep reading)

but whaaat is this collaborative Filtering ??

Wiki says: Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).

If wiki definition doesn’t help you, don’t worry I will break it down for you.

Collaborative filtering is a mathematical method/formula to find the predictions about how much a user can rate a particular item by comparing that user to all other users.
For example:
To predict PersonA rating on a particular item, you would compute the similarity between PersonA with all users. Take the top users who are most similar to the PersonA, then you would compute the predicted ratings for PersonA items with respective to all top similar users.

Three steps to design collaborative filtering:
Step 1
* Find the top similar users w.r.t to the particular user. (Centered Cosine Similarity)
Step 2
* Predict the users rating on an item based on other users.
Step3
* Recommend the items which have higher predicted value

Lets dive deeper into these steps:

Step 1

“Find the top similar users w.r.t to the particular user. (Centered Cosine Similarity)”

I took the movies data set from MovieLens. It contains the movies, users and tags.
lets import data (Ratings,Movies and Tags) and merge Ratings and Movies into a single table just to understand how it looks.

Import the libraries and CSV

Now, from the above table we can identify the ratings given by a user to a paticular movie.

Before moving forward we need to normalise the ratings.

How and Why do we normalise the ratings.?? Two reasons.
1. We dont know on what scale the ratings is on, is it 0–5 or 0–10 or 0–100 ??
2. If the user dint rate that movie, instead of placing 0 in that position (which is considered to be negative) we should consider it as average rating.

To solve the above problems we implement Centered Cosine

Centered Cosine

  1. We take the mean of user ratings and subtract that mean from all individual ratings divided by the total number of ratings by user.
  2. For all the movies where there is no ratings by user, we replace it with 0.

Example centered cosine on sample data

If we do the centered cosine on the above data set, it looks like:

I took the mean, Merged Ratings and Mean and created new column ‘ratings_adjusted’ which contains the centered cosine ratings for a movie by the user.

For better understading of the data lets change the format of data into user-movie-rating table

Row: Users, Column: Movie, Values: Ratings

Simple enough ? Cool, Now we need to compute the similarity between two users using Cosine Similarity

How to compute cosine similarity of users ?

Cosine Similarity if a mathematical function that calculates the similarity of two users.

A — User whom you want to recommend
B — Other users

You need to compute the sum of products with A and B divided by Root(Square(each value of A and B))

Computing cosine similarity on example values:

The values are just for understanding purposes, these values are not Centered Cosine. Cos(A/B) means “Computing Cosine Similarity between A and B”

Lets compute the cosine similarity for user316 with all users and get top N similar users (In my example N = 10, But feel free to pick any number you want for N)

113673,117918, …. are similar users to user316.

This is how it looks if you map them in table view.

Row: Users; Column: Movies, Values: Ratings

With this we have finished Step1, lets move on to Step2

Step2

“Predict the users rating on an item based on other users.”

We use the weighted average sum of the users to predict their ratings on a particular item

Example of computing the weighted sum:

Lets say we have USERS on the rows and MOVIES on the colums and RATINGS as the values. If we need to predict the rating of user1-movie5.
1. We take the most similar users using cosine similarity(here my N == 2)

Images taken from Stanford class (https://www.youtube.com/watch?v=h9gpufJFF-0)

2. Compute the weighted average:
weighted average = For each ( Similarity * value ) / Sum of similarity

Images taken from Stanford class (https://www.youtube.com/watch?v=h9gpufJFF-0)

Now if we compute the similarity for our movielens dataset, it looks like

I took the denominator into separate variable as it appears in each computation

Notice the ratings values of user — movies, Some of them has been changed. For example notice user316 to movie 2, before it was 0.0 and After its -0.2

We have finished the Step2 lets go to Step3

Step 3

“Recommend the items which have higher predicted value”

We have the complete DataFrame with predicted ratings for a particular user to all movies. All we have to do is to compute the highest ratings (in here I am just taking the top 10 highest ratings for user316)

A simple algorithm would be sufficient for this.

Traversing through all the values of user 316(row wise) appending to “top recommendations”, Sorting them reverse and retrieving the top 10 movies.

Now User316 would like movies 2215,584,2201,…..

We have completed step3 now.

What we have just did is user-user collaborative filtering method. You can also do item-item collaborative filtering by using the same method that we followed. But in item-item collaborative filtering we need to compare the movies to users. (Please watch this)

What next ?? I am gonna write another story on Matrix Factorization, which is an other cool technique for recommendation systems(again just with numpy and pandas).

Thats IT….!!! you have learnt most important recommendation system (Collaborative Filtering, user — user) and how to compute it just with numpy and pandas.

Unleash your skills

Please Clap and comment if you like this story , Until then ✌

References:

  1. Youtube (Best Best Best resource I found for recommendation systems, if you coudnt understand the content of the video I strongly recommend you to rewatch it. :), Because you cannot get a better content for recommendation than this video)
  2. Wiki (Forr understanding cosine similarity and how it works)
  3. Medium (best blog for understanding recommendation systems)

--

--