1. ‘USER BASED’ COLLABORATIVE FILTERING

As we have already seen what Neighbourhood Based Collaborative Filtering is; It will be fairly easy to understand the mechanism of User Based Collaborative Filtering.

  • KNN — K Nearest Neighbours algorithm is used.

Note: Math behind cosine similarity is such that even if 1 movie is in common, no matter what, we end up in 100% Similarity!

Even if Bob loved Star Wars and Ann hated it, In SPARSE DATA SITUATION, Both 100% similar!!!

  • Sparse data leads to problems with collaborative filtering in general
  • Avoid weird stuffs, put thresholds and even preprocess data

SORTING

In this step, we try to chose and arrange our recommendations with best predictions

  • Score your recommendations somehow
  • Taking ratings under consideration — Not a bad idea!
  • Recommend similar things they love. Not similar things they hate. So, better normalize it!
  • Strengthen the relationship if ‘items’ appear in more than one neighbour

FILTERING

This step involves processes such as removing already seen items or the items that are offensive.

All the steps involved in this process can be explained through the block diagram shown below

USER BASED COLLABORATIVE FILTERING

Note: User Based Collaborative Filtering is strictly used for generating top N recommendations.
At no point we tried predicting user ratings (For this, use KNN).
Hence, our framework built on surpriselib revolves around rating predictions. But still we can use Hit Rate.

KEY FEATURES OF COLLABORATIVE FILTERING

  • NOT recommended for SPARSE DATA!
  • Quality as well as Quantity of data is important
  • Lesser time consumption (Compared to SVD or Random Algorithm)
  • High efficiency
  • Only for Top N Recommendations
  • Not for user rating predictions (But can if want to)
  • That’s why Amazon with huge amounts of data uses this method

Note: Even 100,000 Ratings is considered relatively sparse!(But showed better performance in our experiments compared to Content Based Recommendations)

2. ‘ITEM BASED’ COLLABORATIVE FILTERING

Same procedure as above but uses ITEM-SIMILARITY MATRIX instead of USER-SIMILARITY-MATRIX

ITEM BASED COLLABORATIVE FILTERING

Even here, K Nearest Neighbours [KNN]algorithm is used. It is interesting to note that similarities between ‘items’ can be sometimes BETTER than similarities between ‘users’! Some of the reasons are discussed below.

  • Items show permanence whereas, people change with time
  • Items are fewer in numbers to deal with. Which leads to smaller similarity matrix. Amazon and Netflix use it!
  • Better for New users:
    — Him selecting just one item will let us provide recommendations
    — But for user based, new user has to wait until next build of similarity matrix (which is the only computational part of the framework)

POINTS TO REMEMBER — ITEM BASED

  • Here in ‘item-based’ collaborative filtering, we have more recommendations compared to ‘user-based’. Interesting!
  • In practice, we have got all movies from 1990's recommended as there was bias in given data — Caution needed!
  • Test it out on real people — A/B Tests
  • Even small changes in algorithms affect recommendations
  • During evaluation note that
    — 0.055 (or 5.5%) is pretty damn good (Hit Rate)
    — 0.005 (or 0.5%) is pretty bad but ironic if not dealing with biased data
  • Take Away: Hard to evaluate data sets offline. Especially with historical and smaller data sets

--

--