Content based filtering and Collaborative filtering

Recently, I started working on a recommendation system based off of books, and I came across these two techniques. This blog gives a brief overview of collaborative and content based filtering.

Recommendation Systems

Before the advent of internet, movies, books, and essentially any consumer product were recommended to each other by the word of mouth. Now, with E-commerce giants like Alibaba, Amazon etc, we often find ourselves being recommended with products every time we browse these website. Ever wondered how these website recommend relevant products to you ? Content based filtering and Collaborative filtering are two types of recommendation systems that can be used to recommend products to consumers.

Content Based Filtering

This method recommends products similar to what a user has liked in the past.

Basic idea of content based filtering

Lets say you likes action movies, then the recommendation system is more likely to suggest action movies to you.

A simple way to calculate similarity between different two items is using Cosine Similarity.

Think of A as a vector containing all the movie preferences of a particular user and B as a vector containing the features, such as genre, cast, director etc, for every movie. Cosine similarity is used to find the similarity between the user’s preferences and all the other movies. A Pearson correlation or Euclidian distance can be also used to do the same.

After calculating the similarity between the user’s preferences and all of the other movies, you can do the following. 
1. Suggest the top n movies to the user, for a n >0.
2. Suggest all movies based on a threshold to the similarity value.

Although, this method is quite straightforward and intuitive, it has some major drawbacks. This algorithm will always recommend products that are similar to what the user has purchased before. Therefore, it will never recommend anything that the user hasn’t bought or liked in the past. If a user has only liked Romantic movies in the past, the algorithm is more likely to recommend these movies to the user. The algorithm will also find it harder to find recommendations for new users.

To improve upon this system, we need an algorithm that doesn’t just take the content of the user’s purchase history, but also the behavior of the user. This brings us to the idea of collaborative filtering.

Collaborative Filtering

Collaborative filtering uses different types of techniques to make recommendations. Here are two common types of collaborative filtering techniques, User-User Collaborative Filtering and Item-Item Collaborative Filtering.

(L)User-User Collaborative Filtering (R) Item-Item Collaborative filtering

User-User Collaborative Filtering
This method utilizes similarity between users to recommend products. Based on these similarity scores, it recommends products bought by the most similar users to each other. I personally like using Pearson correlation to find similarities since they normalize the ratings.

Formula for predicting items

1. P (u, i) is the prediction of an item i for a user u.
2. r(v, i) is the rating given to item i by user v.
3. S(u, v) is the similarity between user u and user v

Important Notes:
1. For predictions, we need similarity between user u and user v. We can use Pearson correlation there.
2. We find the common items rated by both the users, u and v, and find the respective ratings.
3. Users having high correlation tend to be SIMILAR.
4. This formula essentially finds the weight average ( using similarity between the two users) of all items that both users have rated, and normalizes it by the similarity values.

Item-Item Collaborative Filtering

The basic idea of item-items collaborative filtering is quite similar to user-user collaborative filtering. Like user-user collaborative filtering where we tried finding similar users to make predictions, in Item-Item collaborative filtering we find item-item similarity, and estimate rating for items based on similarity to other items.

Item-Item Collaborative Filtering

Here is what the rating function looks like:

Rating function for item based collaborative filtering

R(xj) = rating of user x on item j 
S(ij) = similarity between item i and item j 
N(i;x) = set containing items rated by x and also items similar to item j

1. In theory, even though item-item and user-user collaborative filtering seems similar, item-item collaborative filtering outperforms user-user collaborative filtering(usually). 
2. The reason for item-item outperforming is the fact that items are easier to describe. 
3. Users can tend to be different than each other; therefore, they are harder to describe. Hence, item-item collaborative filtering outperforms user-user collaborative filtering.

Hope you found this article to be useful!