Neighborhood vs Latent Factors Methods in Collaborative Filter Recommender Systems — Part 1

Medford Xie
6 min readMar 1, 2019

--

Foreword: This is part 1 of a 3 part series.

Collaborative Filtering (CF) recommender systems have become an essential and powerful tool that many companies rely upon to drive their business. Most notably, Netflix with their movie recommender engine, Amazon’s product recommendation system, Spotify’s song recommender, and much more.

Source: researchgate.net

For those who are unfamiliar with Collaborative Filtering (CF), CF utilizes past transactions or user behavior to find connections or similarities between users and products. In essence, you are trying to find items that are similar to one another based on how your collective user base have rated these items, or you can find users who are similar based on the items that they have mutually rated similarly. I am using the term “rated” here loosely as this can mean the user actually providing feedback (explicit feedback) or that they have purchased an item repeatably (implicit feedback). I will talk more about explicit and implicit feedback later.

Introduction

There are numerous articles and tutorials on building CF recommender systems, but not too many go deep under the hood. In this post I will do a deep dive into the inner workings of the Surprise package, a Python implementation for collaborative filtering recommender systems. The goal is to help you:

  1. Develop a solid understanding of the differences between neighborhood models (NM) and latent factors (LF) models commonly used in CF.
  2. Develop a deep enough intuition and understanding so that you can write your own code or customize the code in Surprise for your own purpose.

Most of this post is inspired from Nicolas Hug’s (author of Surprise) work and a research paper from Yehuda Koren, one of the main members from the team that won the famous Netflix Prize, and whose work the Surprise package is based upon.

Neighborhood Model vs Latent Factors Model — A Visualization

A visual way to think about the difference between NM and LF can be highlighted in the below figures.

Neighborhood Model
Latent Factors Model

Neighborhood methods are centered on computing the relationships between items or alternatively, between users. An item-item approach evaluates the preference of a user for an item based on ratings of similar items by the same user. In a sense, these methods transform users to the item space by viewing them as baskets of rated items. This way we no longer need to compare users to items, but rather directly relate items to items.

Latent factor models, such as singular value decomposition (SVD), comprise an alternative approach by transforming both items and users to the same latent factor space, thus making them directly comparable. The latent space tries to explain ratings by characterizing both products and users on factors automatically inferred from user feedback. For example, when the products are movies, factors might measure obvious dimensions such as comedy vs. drama, amount of action, or orientation to children; less well defined dimensions such as depth of character development or quirkiness; or completely uninterpretable dimensions. For users, each factor measures how much the user likes movies that score high on the corresponding movie factor. (Koren)

As Koren puts it, neighborhood models requires computing the relationships between items (Item-Item relationship) or users (User-User relationship) — the most common way to do this is to compute a similarity matrix using cosine distance or Pearson’s correlation coefficient (Pearson’s R) — we will get more into this later. If you are trying to compare items, you need to transform the user data into the item space (vice versa to compare Users).

With LF you are comparing the features that users have an affinity for, against the features that items have an affinity for. To get a better intuition of what latent factors represent I highly encourage you to read Hug’s work as reference above.

Baseline Estimate

With recommender systems, you will almost certainly be dealing with sparse matrices where there are a lot of missing values — since the vast majority of users provides little to no feedback. Scenarios where 99% of all possible ratings are missing is likely the norm.

A critical component of recommender systems is to impute or “guess” what these missing ratings (r̂_ui) are, given the available data.

Ratings Matrix

For the rest of this post, I am going to use this matrix, R, as an example to carry out calculations. R is a 4 x 4 matrix where the rows represents the users and columns represents items. Each row-column intersection represents a specific user’s rating for a particular item, I.

A crude yet robust way to impute these ratings is to use a baseline estimate that takes into account user bias and item bias. The formula for this baseline estimate is as follows:

Equation 1

b_ui is the baseline estimate that we are trying to find; μ is the mean of all known ratings; b_u is the user bias — which accounts for a specific user’s tendency to rate lower or higher than μ; and b_i is the item bias — which accounts for a specific item’s tendency to receive ratings that are lower or higher than μ. As an example, we are going to compute b_ui for user 2’s rating on item 1.

Calculations are as follows:

μ : (2+3+5+4+1+3+5+4+2)/9 = 3.22

b_u : (5+4) / 2 - 3.22 = 1.28

b_i : (2+1) / 2 - 3.22 = -1.72

b_ui: 3.22 + 1.28 - 1.72 = 2.78

In summary, based on the ratings in R, we estimated user 2 would rate item 1 to be 2.78 using the baseline estimate.

In the real world, you would be dealing with much bigger datasets and rather than by calculating each user bias and item bias individually, you would estimate user bias and item bias by solving the least square problem:

Equation 2

The last part of the equation is a regularization term to prevent overfitting. r_ui is the known ratings, and lambda is a hyper parameter. Surprise uses both Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD), whichever is more efficient. For more information you can read the documentations.

This is much better than imputing by the mode or mean of each column or row as this takes into account both user and item biases.

If you are trying to impute your matrix with the baseline estimate only, then you can just denote r̂_ui = b_ui. However, this can be further improved.

In part 2 we will go over Neighborhood and Latent Factors Models.

--

--