User-Based Collaborative Filtering

Premashanth Kumanan
Movie Recommendation System
4 min readApr 19, 2019

Finding other users similar to yourself based on their rating history and recommending movies that you haven’t watched. It’s recommending movies on other user’s collaborative behaviors. That’s why it is called user-based collaborative filtering. Along with the theory, I will be discussing the code implemented with python

As the first step, we need to find similarities between users. There are various method to measure the similarity.

  1. Cosine Similarity
  2. Adjusted Cosine
  3. Pearson
  4. MSD
  5. Jaccard

Cosine similarity metrics is the best way to measure similarity.

Cosine Similarity

Cosine Similarity

The Cosine similarity metrics works really fine on most of the cases. We have to think each attribute as dimension and similarity as an angle between those multi-dimensional space. In this case, we can take ratings by the user as an attribute. Dimensions are based on user behavior. It is hard to imagine the angle between multi-dimensions but computing is so easy.

A big challenge in measuring this similarity is the sparsity of data. Not all the users have seen every movie and rated it. It means it is tough to work with collaborative filtering until we have a lot of user behavior data. For example, there is no common thing between user it is not possible to use these algorithms. According to our purpose, we need so much data to work with.

Python Implementation

After loading up the Data set and building the training set in line 16 we are specifying that we are going to use cosine similarity metrics. (Surprise Library). Line 26&27 we are gathering top N similar users to the test subject.And in line 34 quickly sorting the users by similarity and pick the top k results.

Sample Data 01

For example, If we want to recommend a movie to user 02 first we will find the user who has watched and rated the similar movies which user 02 had rated. In the above example user, 01 have more common things than user 03. So from now on user 01 will be selected as a similar user. If both user 01 & user 02 have liked Avengers 1 & Avengers 2 there might be a high chance they would like Avengers 3 also. So our algorithm will recommend Avengers 3 and Iron-man for the user 2.

Sample data 02

Even after finding a similar user we want only to recommend good stuff to the user 02. If take a close look at the sample data 02, Avengers 03 has the good rating and Iron-man has less rating. This is just a one user score for Avengers 3.There could me many users rated the Avengers 3. All other ratings will be added and taken into consideration for recommending good stuff. We have to display good rated movie at the top and less rated movie at the bottom. So we will add similarity score the movies which have fine rating and add negative scores to the movies with a bad rating. So now we can recommend the movies with high similarity scores on the top of the list. Above said things will be implemented in the following code.

After sorting the similar users, in line 37 to 43 we are getting stuff they rated and add up the ratings for each item. In line 46 we are building a dictionary to add the watched items so we can get rid of watched movies. In line 52 we gathering top rated movies from similar users.

Summary

We discussed user-based collaborative filtering which algorithm will be much more helpful when building a recommendation system. We got a clear understanding of similarity matrices. And finally, we saw the algorithm implemented in Python.

Hope you all got a clear understanding of the topic. There are more other articles on the same topic we discussed. You can read those to get more understanding.

You can find a more detailed explanation about these topics is in the following course in Udemy . (https://www.udemy.com/building-recommender-systems-with-machine-learning-and-ai/)

--

--