Recommender Systems Intro Notes(Stanford Mining Massive Datasets Lecture 41 ->43)

Shengyu Chen
8 min readNov 20, 2018

--

These are the key points from the Stanford lecture “Mining Massive Datasets”. The lecture provides some theoretic explanations of the key steps to different techniques of recommender systems and provides intuitions behind the methods. However, the lecture doesn’t go into the details of actual implementation and tips and tricks around building specific recommenders.

Choices are often problems nowadays

Notes are structured as the following

  1. [A]Overview
  2. [B]Content based recommenders
  3. [C]Collaborative filtering
  4. [D]Hybrid Recommenders
  5. [E]Evaluating Recommender Systems

[A] Overview on various types of recommender systems:

Items of recommendations include (products, movies, music, news items …)

There are two ways to interact with catalog of items:

  1. Search/Filter
  2. Recommendations from the catalog items

Why recommendation is required: (Moving from scarcity to Abundance)

  1. Shelf space is a scarce commodity for traditional retailers but online shelf space unlimited
  2. The web enables near zero cost dissemination of information about products (this gives rise to the long tail phenomenon)
  3. More choices necessitate better filters (Recommendation engines)
The long tail

The area under the curve for the long tail is quite large if we can efficiently capture all these long tailed items. A good recommendation system exposes the long tail/hidden gem to users.

Types of recommendations:

Editorial and hand curated items include List of favorites and List of essential items. Editorial and hand curated items don’t scale very well and require a lot of human input and fine-tuning. Simple aggregates include the top 10, most popular, recent uploads. These simple aggregates don’t take users’ personalization into account.

Formal Model of the recommender system:

Utility function that looks are every customer and item pair and map to a rating, which stands as R.

An example of the utility matrix looks like this:

The key problem for recommender system is to figure out the blank values in the utility matrix. Push the predicted unknown to the user if fits the criteria. There are three key problems for recommenders:

  1. Gathering known ratings for matrix: how to collect the data in the utility matrix
  2. Extrapolate unknown ratings from the known ones: mainly interested in high unknown ratings: we are mainly interested in high unknown ratings
  3. Evaluating extrapolating methods: how to measure success/performance of recommender methods.

Gathering Ratings:

  1. Explicit: ask people to rate items. Doesn’t scale: only a small fraction of users leave ratings and reviews
  2. Implicit: learn ratings from user actions (Purchase implies high rating, what about low ratings?)

Extrapolating Utilities:

Key problem:

  1. utility matrix U is sparse. Most people have not rated most items
  2. Cold start: new items have no ratings, new users have no history

Three approaches to recommender systems:

  1. Content based
  2. collaborative
  3. Latent based models

[B] Details of the content based recommenders

Main idea: recommend items to customer x similar to previous items rated highly by x

  1. Build item profiles: users may rate different item highly. Item profile is a description of the item. This is the metadata of the item.
  2. Based on the item profiles, build user profiles. User profile implies what the user likes.
  3. Based the user profile, we can match the user profile with the item catalog.

1. Building Item profiles:

For each item, create an item profile. Profile is a set of features for example:

  1. Movies: authors, title, actor, director
  2. Images, videos: metadata and tags
  3. People: set of friends

Convenient to think of the item profile as a vector: one entry per feature (each actor, director, … Vector might be boolean or real-valued

Example of news article recommendation:

Text features: set of important words in item (document) -> for example recommending the news. Now the question becomes we need to pick important words: usual heuristic from text mining is TF-IDF (term frequency * inverse doc frequency)

TF-IDF refresher

2. Building User profile:

User has rated items with profiles i1,…,in

The simplest way to construct user profile is just to take a weighted average of rated item profiles. The simple way to construct doesn’t take into account the user preferences.

Variant: normalize weights using average rating of user. More sophisticated aggregations are possible.

Simple User profile examples:

Boolean Utility Matrix
Star rating capturing positive and negative ratings

3. Build Predictions (After user profile and item profiles are constructed)

In high dimensional space, the angle theta is a good measure of distance between two vectors

Based on the user profile x, we compute the cosine similarity between user profile x and the items y. Then we pick the top n items y that have the highest cosine similarities with the user profile and push those recommendations to the user.

Pros and Cons of Content based filtering:

Pros

  1. No need for data on other users: only the user and the user’s history is enough
  2. Able to recommend to users with unique tastes
  3. Able to recommend new and unpopular items. Don’t need the item history. No first rater problem. When the item is new, no one has rated yet but based on the item profile that gets populated, the item can be recommended to users with similar cosine similarity.
  4. Explanations for recommended items: content features that caused an item to be recommended (For example, the explanatory text for a recommendation could be: this was recommended to you because you have watched similar movies, read similar news, cared about specific topics etc.)

Cons

  1. Finding the appropriate features is hard. In movies, some item categorization are blurry.
  2. Overspecialization: never recommends items outside user’s content profile. If the user never interacted with a specific item, then items don’t really get recommended. People also have multiple interests that we don’t know. It also unable to exploit quality judgments of other users.
  3. Cold start problem for new users: how to build a user profile? If the user hasn’t rated any items, then there’s no user profile. Building new user profile for the. To deal with this, many recommenders create average profiles and gradually improve the user profiles overtime

[C] Details of the Collaborative Filtering based recommenders

Basic idea to the collaborative filtering: (User to user similarities)

The key trick is to find the neighborhood of users who are similar to user x.

The notion of similarity between users are important.

The key in defining similarity functions is to define how we deal with unknown values.

Similarity Measure 1: Jaccard Similarity

Similarity Measure 2: Cosine Similarity

This treats the unknown values as zeros.

Similarity Measure 3: Centered cosine similarity

Normalize ratings by subtracting row mean. The average of every row then becomes 0. This is significant because now 0 means neutral. So if we treat blank as 0, it doesn’t mean it’s negative.

Rating Predictions

Item-item collaborative filtering

For item i, find other similar items, estimate rating for item i based on ratings for similar items. Can use same similarity metrics and prediction functions as in user-user model.

Item-item collaborative filtering example:

Item to item collaborative filtering in the example only finds movies that are similar only based on the other users’ ratings. It doesn’t take into considerations of other item contents such as item metadata: type, genre, release date etc.

In Practice, item to item outperforms user to user in many use cases. This is because items are simpler than users. Items belong to small set of genres while users have varied tastes. Item similarity is more meaningful than user similarity.

Collaborative Filtering Implementation

The most expensive step is finding the K most similar users (or items). If this is done for every step, then we’d have to do this for every items / users in the utility matrix.

This computation is too expensive to do at runtime. Therefore, the task of finding x similar users or y similar items can be pre-computed.

Near-neighbor search in high dimensions (LSH). Take an item to quickly find a set of neighbors. This can be done once every day or every few hours. Or alternatively, we can use clustering to search only within clusters.

Pros and Cons of collaborative filtering

Pros:

  1. Works for any kind of item (no feature selection needed) This is good because finding the features for complex things such as images and

Cons:

  1. Cold start: for collaborative filtering. We’d need to have enough users in the system to find a match
  2. Sparsity: even when we have enough users. The user/ratings matrix is sparse. The ratings matrix is sparse. Hard to find users that have rated the same items
  3. First rater: cannot recommend an unrated item. New items, esoteric items don’t get pushed to the user.
  4. Popularity bias: tends to recommend popular items since a lot of people will populate ratings for popular items. The popular items will crowdout/washout the unique recommendations

[D] Details of Hybrid methods for recommender systems

Add content based methods to collaborative filtering:

  1. Item profiles for new item problem
  2. Demographics to deal with new user problem

Alternatively, we can also implement two or more different recommenders and combine predictions.

Global Baseline Estimate

Combining global baseline estimate with Collaborative filtering

[E] Evaluating Recommender Systems

Problems with error measures such as RMSE:

  1. Narrow focus on accuracy sometimes misses the point: prediction diversity (All recommendations are too similar to the item), prediction context (when contexts switch, users’ interest may change as well. At a given point, user may prefer a specific category of product but later when the context disappears, the preference also changes.), order of predictions (for example a sequence of movies or TVs, the recommended result should be in the set order).
  2. In practice, we care only to predict high ratings. The RMSE might penalize a method that does well for high ratings and badly for others. Alternative is precision at top k. Percentage of predictions in the user’s top k withheld ratings.

--

--

Shengyu Chen

Doing to think better, writing to remember. Sharing makes me feel that I am working on things bigger than me. #build #create