User-Based and Item-Based Collaborative Filtering — Part 5

Rakesh4real

Follow

Published in

Fnplus Club

3 min readJul 29, 2019

--

1. ‘USER BASED’ COLLABORATIVE FILTERING

As we have already seen what Neighbourhood Based Collaborative Filtering is; It will be fairly easy to understand the mechanism of User Based Collaborative Filtering.

KNN — K Nearest Neighbours algorithm is used.

Note: Math behind cosine similarity is such that even if 1 movie is in common, no matter what, we end up in 100% Similarity!

Even if Bob loved Star Wars and Ann hated it, In SPARSE DATA SITUATION, Both 100% similar!!!

Sparse data leads to problems with collaborative filtering in general
Avoid weird stuffs, put thresholds and even preprocess data

SORTING

In this step, we try to chose and arrange our recommendations with best predictions

Score your recommendations somehow
Taking ratings under consideration — Not a bad idea!
Recommend similar things they love. Not similar things they hate. So, better normalize it!
Strengthen the relationship if ‘items’ appear in more than one neighbour

FILTERING

This step involves processes such as removing already seen items or the items that are offensive.

All the steps involved in this process can be explained through the block diagram shown below

Note: User Based Collaborative Filtering is strictly used for generating top N recommendations.
At no point we tried predicting user ratings (For this, use KNN).
Hence, our framework built on surpriselib revolves around rating predictions. But still we can use Hit Rate.

KEY FEATURES OF COLLABORATIVE FILTERING

NOT recommended for SPARSE DATA!
Quality as well as Quantity of data is important
Lesser time consumption (Compared to SVD or Random Algorithm)
High efficiency
Only for Top N Recommendations
Not for user rating predictions (But can if want to)
That’s why Amazon with huge amounts of data uses this method

Note: Even 100,000 Ratings is considered relatively sparse!(But showed better performance in our experiments compared to Content Based Recommendations)

2. ‘ITEM BASED’ COLLABORATIVE FILTERING

Same procedure as above but uses ITEM-SIMILARITY MATRIX instead of USER-SIMILARITY-MATRIX

Even here, K Nearest Neighbours [KNN]algorithm is used. It is interesting to note that similarities between ‘items’ can be sometimes BETTER than similarities between ‘users’! Some of the reasons are discussed below.

Items show permanence whereas, people change with time
Items are fewer in numbers to deal with. Which leads to smaller similarity matrix. Amazon and Netflix use it!
Better for New users:
— Him selecting just one item will let us provide recommendations
— But for user based, new user has to wait until next build of similarity matrix (which is the only computational part of the framework)

POINTS TO REMEMBER — ITEM BASED

Here in ‘item-based’ collaborative filtering, we have more recommendations compared to ‘user-based’. Interesting!
In practice, we have got all movies from 1990's recommended as there was bias in given data — Caution needed!
Test it out on real people — A/B Tests
Even small changes in algorithms affect recommendations
During evaluation note that
— 0.055 (or 5.5%) is pretty damn good (Hit Rate)
— 0.005 (or 0.5%) is pretty bad but ironic if not dealing with biased data
Take Away: Hard to evaluate data sets offline. Especially with historical and smaller data sets