Photo by Calum Lewis on Unsplash

Content-Based Recommender Systems and Association Rules

Jackson Wu

--

Some things just go together — milk and cereal, Florida and strange headlines, etc. Forming association between things is a fundamental way we learn about the world. We can carry this concept to the world of recommender systems.

The process of finding sets of items that frequently occur together is called frequent pattern mining, and has an entire field of study devoted to it. Fortunately, we only have to understand a few of its key concepts before we can start applying it to recommender systems.

Background

We find association rules over a set of transactions. We call this set a transaction history or I.

The support of item set X in a user’s transaction history is the percentile of transactions in which X appears. If you happened to buy ketchup every time you went to a certain grocery, the support of ketchup over all your transactions would be one.

The confidence of rule X → Y is the strength of the rule — how often the former implies the later. More precisely, it is the conditional probability that Y will be in a transaction if X is within that transaction as well. We can derive this probability by dividing the support of X U Y with the support of x.

If a rule X → Y has high confidence, the rule is strong and a set having X heavily implies that Y will also be in that set. If one knows X → Y, then they can suggest item Y to buyers of X.

A rule X → Y is said to be an association rule at a minimum support of s and minimum confidence of c, if the following two conditions are satisfied:

1. The support of X ∪ Y is at least s.

2. The confidence of X ⇒ Y is at least c.

Benefits of Association Rules

One of the benefits of creating a recommender system with association rules it that its recommendations are highly interpretable. While methods like latent factoring methods are powerful, the trends from which their results are gleaned are cryptic. Association rules are simple: “The books you like frequently have these keywords? We’ll recommend you more books like that.”

While this may seem like a trivial point, remember our design principles! One of our goals was to have users trust that the recommendations we make will be relevant. Giving concrete explanations for each recommended item might give users more of an incentive to believe our recommendations.

Implementation

So how do these association rules apply to content-based recommender systems?

In this article, we’ll construct a content-based recommender on the Goodbooks-10k dataset using association rules as the basis for our model.

Start by creating a list of relevant tags for each book.

Then, make a user profile for each user, storing each rated item with a list of its associated keywords.

The next step is to resolve all conflicts, such as {sci-fi=like} and {sci-fi=dislike}. If we have both of these rules and they have similar confidence levels, we can’t reliably say that the “sci-fi” tag is associated with a like or a dislike.

For each user profile, we mine all of its association rules for a given minimum support and minimum confidence.

The antecedent in content based system differs from collaborative filtering, where both the antecedent and consequent of rules are ratings. These rules will have some set of keywords in the left-hand side, and a consequent rating on the right.

Once we have all of these rules for each user, we can use them to start predict ratings.

Let’s say we are predicting ratings for user A. For each item i, we check if the attributes of i “fire” any of A’s rules. A rule is said to be fired by an item if the left-hand side of the rules is a subset the item’s attributes. We then sort this set of rules by descending confidence levels.

In conclusion, rule-based recommender systems are simple in concept, but flexible and highly interpretable. They can be used as a model for content-based system, but also for collaborative filtering (a rule might look like {Likes x} → {Dislikes y}).

Find an overview of content-based recommender systems here.

--

--