Today I’m writing about Slope One Predictors for Online Rating-Based Collaborative Filtering by Daniel Lemire and Anna Maclachlan.
The purpose of this paper is to show that Slope One schemes have some desirable characteristics while getting a competitive accuracy compared to other methods for item rating prediction. The paper is from 2008 so it’s a bit outdated, but their methodology could work as inspiration for many. I will paste here these desirable characteristics I mentioned:
- Easy to implement and maintain: All aggregated data
should be easily interpreted by the average engineer and
algorithms should be easy to implement and test.
- Updateable on the fly: The addition of a new rating
should change all predictions instantaneously.
- Efficient at query time: Queries should be fast, possibly
at the expense of storage.
- Expect little from first visitors: Auser with few ratings
should receive valid recommendations.
- Accurate within reason: The schemes should be competitive with the most accurate schemes, but a minor gain in accuracy is not always worth a major sacrifice in simplicity or scalability.
They compare their approach with other four methods. I will explain the first two which are simpler, the other two are better explained in the paper itself:
- Per User Average: Predict that a user will rate everything according to that user’s average rating.
- Bias From Mean / Non Personalized: The prediction is based on the user’s average plus the average deviation from the user mean for the item in question across all users in the training set.
- Adjusted Cosine Item-Based.
The authors’ method
Their first approach takes into account “both information from other users who rated the same item (like the Adjusted Cosine Item-Based) and from the other items rated by the same user (like the Per User Average)”, and they finally come up to a formula that only depends on the user’s average rating and which items has the user rated:
In this formula, u with a bar is the user average rating, j is the item to predict, R sub j are all the entries that don’t have item j, card(S) considering any set S is the number of entries in it, and dev is the average deviation of item i with respect to item j as follows:
Here’s a new term S sub i,j , which simply are all the entries with contain i and j.
In their second approach they include information of the number of ratings observed and name their approach Weighed Slope One. Here’s the new formula:
Here the new terms are: X which are all the evaluations in the training set and S(u) which are all the entries for the user u.
Finally, they divide the training set into liked and disliked items, depending on if the item has a rating smaller or greater than the user’s average rating, and the formula is a weighted approach similar to the Weighted Slope One but considering the liked and disliked items separately weighted. They name it Bi-polar Slope One and here’s its formula:
Where p is just the same used in Weighted Slope One:
They use All But One Mean Average Error as evaluation metric, for which they try to predict one item having the information of the rest and repeating this for all the items. Here’s what they found:
So their objective is completed. They achieved competitive results with a method that meet their requirements. With their Bi-Polar Slope One approach they event obtain the best results.
What do I think?
I found this paper very straightforward. First, it has a well defined and simple objective, mentioning all the characteristics that their method should have. Also, the explanations of related work and baselines are very good and compact. Then, the methodology used is very well explained, it would be easy for someone to replicate the experiments and understand why every step is for. And at the end, they return to their objective and show that it was accomplished.