Overview of collaborative filtering algorithms

ak2400
ak2400
Apr 14 · 6 min read

Content

Section 0. Introduction

Section 1 : User-based method

Section 2. Item-based method

Section 3. Model-based method

Section 4. Summary

©️Copyright

Section 0 : Introduction

The motivation for collaborative filtering comes from the idea that people often get the best recommendations from someone with tastes similar to themselves. Collaborative filtering encompasses techniques for matching people with similar interests and making recommendations on this basis.

Generally speaking, there are three types of collaborative filtering recommendations.

  • The user-based method
  • The item-based method
  • The model-based method

Section 1 : User-based method

The User-based method mainly considers the similarity between users and users. By finding out the items that similar users like and predicting the target users’ ratings of the corresponding items, we can find a number of items with the highest ratings and recommend them to users.

The general process of user-based CF method is as follows.

  • Analysis of the rating/evaluation of item by individual users (through browsing records, purchase records, etc.)
  • Calculate the similarity between all users based on their rating/evaluation of item
  • Select the Top-N Relevance User that are most similar to the current user
  • Recommend the item that Top-N Relevance User will be the highest rated and the current user has not viewed

Example:

(1) Calculate a user-item correlation matrix based on the site’s records, i.e., each user’s rating of the item.

(2) Calculate the similarity between users and users, using cosine distance, the larger the value the more similar it is.

(3) We want to recommend items for user A, then we find the items rated by N users (let N=2) with the highest similarity to user A, and remove the items rated by user A, then the recommendation result.

Pros and cons:

  • Pros: Under the condition of perfect data set and rich content, we can obtain high accuracy rate, and we can avoid the mining on Item Profile for recommendation, and we can implicitly and transparently mine the relevance of items and user’s preference.
  • Cons: As the number of system users increases, the time required to calculate Top-N Relevance User grows significantly, making the method difficult to perform in systems with huge changes in the number of users; it is difficult to accurately calculate Top-N Relevance User when new users join and few scoring records are kept.

Section 2 : Item-based method

Item-based collaborative filtering is similar to user-based method, except that in this case we turn to finding the similarity between items and items. Once we find the ratings of certain items by target users, then we can predict similar items with high similarity and recommend a number of similar items with the highest ratings to users.

The general process of user-based CF method:

  • Analyse the browsing records of each user on the item.
  • Calculate the similarity between all items based on user’s browsing records.
  • For the user-rated items, find the Top-N Relevance Item with highest similarity.
  • Recommend this Top-N Relevance Item list to the user.

Pros and cons:

  • Pros: The calculation is simple and easy to achieve real-time response. The calculation of item similarity can generally be done offline and updated periodically, thus reducing the need for online calculations, due to the low variability of the scored items.
  • The Item-based method takes less account of the differences between users and is therefore less accurate than the User-based method. It also has the problems of data sparsity and cold-start.

Section 3. Model-based method

The data of collaborative filtering model is generally m items * n users , only part of users have rating data for items. In this case, it is necessary to use the existing partially sparse data to predict the rating relationship between those blank items and the data, and find the highest rated items to recommend to the user. For this problem, the idea of machine learning can be used to solve.

  • Association algorithm for collaborative filtering

Generally, we can find the frequent item sets or sequences of all items purchased by users, and find the frequent Top-N of related items that satisfy the support threshold. We can recommend other items in the frequent item sets to the user according to certain scoring criteria, which can include support, confidence and lift. The commonly used association recommendation algorithms are Apriori, FP Tree and PrefixSpan.

  • Clustering algorithm for collaborative filtering

Collaborative filtering with clustering algorithms is somewhat similar to the User-based and Item-based method. We can cluster by users or items based on a certain distance metric. In the case of user-based clustering, users can be divided into different target groups based on a certain distance metric, and items with high ratings from the same target group can be recommended to the target users. In the case of item-based clustering, similar items with high user ratings are recommended to users. Commonly used clustering recommendation algorithms are K-Means, BIRCH, DBSCAN and spectral clustering.

  • Classification algorithm for collaborative filtering

If we divide the scores into segments according to the user ratings, the problem becomes a classification problem. For example, the most straightforward, set a rating threshold, the rating above the threshold is recommended, the rating below the threshold is not recommended, we will turn the problem into a binary classification problem. Common classification recommendation algorithms are logistic regression and Naïve Bayes, both of which have strong interpretability.

  • Regression algorithm for collaborative filtering

Using regression algorithms for collaborative filtering looks more natural than classification algorithms. Our rating can be a continuous value instead of a discrete value, and with the regression model we can get the predicted rating of an item by the target user.

  • Matrix factorisation for collaborative filtering

Using matrix factorisation to do collaborative filtering is a method that is also widely used nowadays. Since the traditional singular value decomposition SVD requires that the matrix cannot have missing data and must be dense, and our user item rating matrix is a very typical sparse matrix, it is complicated to use the traditional SVD to collaborative filtering directly. The current mainstream matrix factorisation recommendation algorithms are mainly some variants of SVD, such as FunkSVD, BiasSVD and SVD++.

Section 4. Summary

Collaborative filtering, a classical kind of recommendation algorithm, is widely used in industry. It has many advantages; the model is general, does not require much expertise in the corresponding data domain, is simple to implement in engineering, and works well. These are the reasons for its popularity. Of course, collaborative filtering also has some unavoidable challenges, such as the “cold start” problem, where we can’t recommend items to new users without any data about them. It also does not take into account the differences in scenarios, such as the user’s current mood and the user’s location. Of course, it is also impossible to get some niche unique preferences, which is what content-based recommendations are better at.

©️Copyright: Medium ak2400

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

ak2400

Written by

ak2400

Data Scientist

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

ak2400

Written by

ak2400

Data Scientist

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store