Photo by Jeff Sheldon on Unsplash

Content-Based Recommendation Systems

Michael J. Pazzani and Daniel Billsus

Yoav Navon
2 min readAug 29, 2019

--

This paper discusses content-based recommendation systems in general. It explains how to represent items with structured data, and how to generate structured data from non-structured. For example, for natural language text, the tf-idf method it’s discussed. Afterward, it discusses two approaches to keep a user profile, with a model or a history of the user’s interaction. Finally, different popular models are presented as ways to create a content-based recommender system. Decision trees, KNN, Relevance Feedback, Linear Methods, and Naïve Bayes are proposed as classification learners.

The methodology for content-based recommender systems is explained as follows:

… [the models] learn a function that models each user’s interests. Given a new item and the user model, the function predicts whether the user would be interested in the item…

Isn’t that inefficient? To provide recommendations for a user, with this approach the system would pass all items through the model to see if they are a match or not for the user. Some pre-computation seems necessary, as the number of items can be really large.

For all the models presented, there is no scalable way of adding new items for the users, the only way would be to recompute the model all over again from scratch. You would like for example your Decision Tree to adapt to the new items that the user likes.

It’s surprising that the Naive Bayes method does well as the authors claim. That's because, for text, the algorithm assumes that all words are independent of each other, and that is of course not true, as some words are extremely more probable if they are present after another particular word. Maybe some way to address this would be to group all words that tend to appear together (with some clustering method maybe), and use only one as representant of the group.

--

--