Building A Collaborative Filtering Model With Decision Trees

4 min readMay 27, 2019

As we discussed when we covered basic collaborative filtering methods, completing a ratings matrix is a generalization of a classification problem. So, what if we use other methods of modeling classification problems to predict ratings? In this article, we will use decision and regression trees to construct a model-based recommender system.

Anything is possible with an army of trees!

Decision Trees

First, a brief review of decision and regression trees. Decision trees partition data into classes with a series of hierarchical decisions. For binary trees, these decisions might be equivalent to a series of yes or no questions. We will be primarily focusing on univariate binary trees, which means each node has at most two children and each split is based on the value of one feature.

We call them decision trees when the dependent variable is categorical and regression trees when the dependent variable is a numeral. You might choose different split criteria and evaluation techniques for decision and regression trees, but they operate on the same principles.

Split Criteria (Gini index)

In decision trees, the most common measure of split quality, or split criteria, is the Gini coefficient. It lies on a scale between 0 and 1, with smaller values indicating more equal frequency distributions among classes.

For trees with numeric independent variables, we can define the splits by partitioning each variable into ranges of values, or intervals. This effectively creates categories from the continuous independent variables, and we can many of the techniques from decision tree. If the independent variable is numeric, then our split criteria has to be different. Typically, the variable is chosen that minimizes the variance of the dependent variable. If the variance is low, it means the node contain training instances that are closely associated with the locality of the dependent variable. This variance can be defined by average leaf value, or even by a linear regression model.

Note that not all elements are necessarily used to make a split.

For each split, the Gini indices of all elements are calculated, and the element with the smallest index is selected.

Applying Decision Trees to Collaborative Filtering

If you choose to build a collaborative filtering model with decision trees, you will face two problems:

The rating matrix is sparse. If a item has an unspecified attribute, and that attribute is used to make a split, in which branch does the tree place that item?
It is not clear in a ratings matrix which attributes (items) are dependent or independent. They are merely defined as users and items.

The second problem is solved by making separate decision trees for each attribute. In each of these trees, one attribute is dependent and the rest are independent. So for item-based collaborative filtering, we would have a tree for each item. For determining the rating of an given item for a user, the tree associated with that item is used.

The sparseness of the rating matrix is more difficult to deal with. Nonetheless, we can still solve this problem using dimensionality reduction!

Implementation

We will be implementing this on an Anime Recommendation Database.

(You can find the full kernel here.)

Say we are predicting item j in ratings matrix M = m x n with user-based collaborative filtering. The first step is to remove the jth column from the ratings matrix, and treat the remaining columns as features, or independent variables. Now that the independent and dependent variables are designated, we can start to treat recommendation like classification.

The next step is lengthier: we need to further reduce this m x (n -1) matrix to a complete m x d (dimensions) representation of the rating matrix. This will be the matrix with which we train our decision tree.

First, complete a covariance matrix of the n — 1 feature items. This is likely to be sparse, so we must use an algorithm like the EM-algorithm to estimate each value. It is also important to calculate these covariances with only the specified ratings of the items in question (if there’s no specified ratings in question, we estimate the covariance as 0). The top d eigenvectors of the covariance matrix are determined, and then each rating in the rating matrix is projected into those eigenvectors.

The average contribution of user u to eigenvector i. Source: Recommender Systems: The Textbook

We’ve seen the equation used for projecting the ratings before in this article on dimensionality reduction for collaborative filtering techniques. For rated items j in I_u in eigenvector of column i, and user u’s rating of j, the average contribution of that rating to eigenvector of i is defined by the rating times item j in the eigenvector, divided by the number of rated items.

The average contribution is the sum of contributions for items 1 to j-1, divided by the number of items. By finding the average contribution of each user on each eigenvector, we get a complete m x d matrix of ratings, which decision trees can applied to like a classification/regression problem. However, this decision tree is only useful for predicting the ratings of item j. So, as mentioned before, we must repeat this process for all of j (1…n).

Conclusions

Using decision trees for model-based collaborative filtering illustrates how collaborative filtering can be generalized to classification. Many other model-based collaborative filtering methods operate on the same principle. In fact, you can use any classifier as a black box to solve a collaborative filtering problem. See Tumas’ implementation of this classifier black-box idea using neural networks for more.