Photo by Chris Liverani on Unsplash

Performance of Recommender Algorithms on Top-N Recommendation Tasks

Paolo Cremonesi, Yehuda Koren, Roberto Turrin

Yoav Navon
2 min readAug 26, 2019

--

This paper explores the performance of diverse recommender systems in the top-N setting. It is shown that algorithms optimized for minimizing an error metric (e.g. RMSE) do not perform as well on this kind of tasks. The analysis is conducted of different kind of algorithms; not personalized (Most Popular, Average Rating), neighborhood-based (itemKNN) and latent factor models. The authors analyze two versions of the same datasets: using the full set of data, and using the “long tail”, which is the data without the top rating items. It is shown that the most-popular method does better than some algorithms, but in the “long tail” dataset, its performance drops significantly.

The methodology used consisted of using items rated 5-stars, combined with a random sample of additional 1000 items that were considered not relevant, and were not yet rated. Then the model outputs a set of N recommended items, and if the 5-star item is present, it is considered as a hit. Precision and Recall is computed from the average of repeating this methodology. It is worth questioning how good is the assumption that the 1000 items are not relevant. Maybe could be possible to sample these items with a more elaborate heuristic, maybe based on some neighborhood method to reduce the probability of good items slipping in the 1000 items set.

The analysis was conducted in two datasets from a similar domain. Would these results remain unchanged for datasets with other characteristics. For example, a scientific paper dataset has less most popular items, while being more sparse, so the skewness of the distribution would be less. This could result in different methods being the most effective.

--

--