Content-Based Recommender Systems: An Overview

6 min readMay 27, 2019

Content based systems use item attributes (color, critical reception, etc) to make recommendations to users based on their history of interactions. Because they do not rely on ratings from other users, they are especially effective for new items.

There are two main sources of information for content-based recommender systems. The first are textual item descriptions, usually posted by the manufacturer. They describe attributes relating to content of the item. The second is the user profile. It contains ratings from the user, either explicit or implicit.

These attributes are often keywords; therefore, the most apt time to use content-based recommender systems is in text-rich, unstructured domains. If relational attributes such as manufacturer, genre, and price are listed, it they can be combined into a relational database, which is considered a structured domain.

A song recommendation on Pandora with explanation (on right)

Content-based recommenders can offer explanations for recommendations based on the keywords they use (like Pandora, see above).

Some downsides of content based recommender systems include the lack of diversity of recommendations, and the inability to solve the cold start problem of new users.

The latter can be fixed by hybridizing content-based recommenders with knowledge-based recommenders. This is a extremely common combination of recommender systems, used in a variety of applications.

Creating A Content-Based Recommender System

There are three steps to creating a content-based recommendation:

Preprocessing and feature extraction
Model-based learning for user profile
Filtering and Recommendation

First, one must determine which features are the most important, and extract them. This makes a “document” of keywords for each item.

Then, we must clean and process these documents so they are ready for processing.

Photo by The Creative Exchange on Unsplash

Cleaning

Cleaning has three steps:

Stop-word removal: stop-words are common words that have little to do with the content of the item. Words such as ‘a,’ ‘the,’ and ‘at’ are stop words in the English language. Generally, articles, conjunctions, prepositions, and pronouns are stop words.
Stemming is the process of consolidating words with similar roots. If a description has the words “fighting,” “fought,” and “fight” in its description, we can treat those words as just one keyword “fight”. However, this can produce inaccuracies, as some words have similar roots, but different meanings.
Phrase extraction is pairing series of words into a phrase with more meaning than that of the individual words. For example, if the words “hot” and “dog” appear consecutively in that order, one can usually them into “hot dog”.

After cleaning, the keywords are converted to a vector space representation.

The inverse document frequency is used to discount common words across all documents. Rather than completely remove these words like stop words, we just discount them.

Sometimes a frequency damping function is used before similarity comparison to lessen the effect of high frequency words. Normalized frequency of a word is the inverse document frequency times the damping function — -called the tf-idf model. You might recognize this acronym from sklearn’s highly useful tfidfvectorizer.

The above process produces a database of information about items, but we still need information on the preferences of the user. These could include ratings or implicit feedback, as we have dealt with before, but could also include text opinions, which can be mined with sentiment analysis or opinion mining. Preferences could also be cases, where the user lists an example or examples of items they are interested in. Case-based recommender systems are a type of knowledge based recommender system, and are a subject of their own.

These likes and dislikes will eventually be converted to a unary, binary, interval, or real-rating system.

Feature selection and weighting

We must reduce the size of the feature space 1) for computational purposes, as many keywords typically exist, and 2) to prevent overfitting, as the actual number of documents to learn from is usually not that large.

The main idea is to determine the importance of each feature by evaluating the sensitivity of the dependent variable to changes.

Gini

If the appearance of a word in a given set of items is correlated with a significant trend in the ratings of those items, then its Gini index will be low. If the word “comedy” appears in six of the movies a user has rated, and those ratings were all five stars, the Gini index would be low (more significant). If those ratings were all zero stars, the ratings would still be low. This is because the Gini index punishes inequality in frequency distribution.

Entropy

You can also select features using the concept of entropy from information theory; that is, that less probable results contain more information, or more significance. This measure returns similar results as the Gini index, as it operates on similar principles.

X2 statistic creates a contingency table for the co-occurrence of the word and a class. The predictions from this table are compared with the actual data, and the standard deviation between each cell (prediction and actual) is computed.

Feature weighting

Let’s saying you are ranking movies, and you to select features. You might find that the title, or the genre affect the final rating more than the individual words in the description. However, you don’t want to completely ignore the description, as it still hold valuable information.

If you don’t want to remove features outright, but still want to assign varying levels of importance to each feature, you can use feature weighting.

Feature weighting can be simple to implement once you can do feature selection. For example, if you have the Gini index of parameter, you can the relative weight for w with:

g(w)=a−Gini(w)

where a is some value more than 1.

Examples of Models

Once feature selection/weighting is completed, we have to train our model and make predictions. Here is a brief overview of some types of models you can use, many of which we have seen in previous articles.

For sake of discussion, let’s say the keywords for each item are stored in what we’ll call documents.

Nearest Neighbor Classifier

For each document in the test set, its k-nearest neighbors in the training set are determined using cosine similarity. The average rating of the k-nearest neighbors for a given document is the predicted value for that document.

This is expensive! This algorithm will run in O(|test| * |train|).

K-means clustering can be used to decrease the number of documents. In systems with discrete rating values, documents are grouped by rating and clustered with k-means. If s is the number of ratings and p is the number of clusters, this will produce s * p documents to be ranked, which is significantly less computationally expensive.

Collaborative filtering also uses nearest neighbor classification, we can use clustering to accelerate the training process for it too. Read more about it here.

Association Rules

Another way to model content-based recommender systems is with association rules. I talk about this in depth and implement it in this article.

Bayes Classifier

We can also use a Bayesian model for content-based systems, as much of the work involved can be reduced to text classification.

For this model, we’ll use a binary rating system. We need to find the conditional probability that the active user likes or dislikes each item X given each keyword. We then normalize that probability and use the result (1 or 0) as our predicted rating.

Conclusions

Content-based recommender systems can be highly accurate and explainable ways to recommend items. They require less information from other users, and can utilize a wide range of input (title, author names, keywords, tags, etc.).

To see an implementation of a content-based recommender system, see my rule-based recommender system, or Tumas’ content-based factorization system. For more on recommender systems, visit the start of our article series.