Evaluating Recommendation Systems

Guy Shani and Asela Gunawardana

Published in

Recommender Systems

2 min readAug 25, 2019

This paper discusses ways to compare different recommender systems. First, it talks about the different evaluation settings (offline, user-studies, online), and the pros and cons of using them to evaluate an algorithm. Then it discusses different properties of recommender systems that are important to consider at the time of selection of a particular method.

User preference is proposed as a method for evaluation, conducting studies to assert which models the users report more satisfaction. Accuracy is discussed as the most usual way to measure a recommender system, and different metrics are presented, and the different types to measure accuracy (prediction, usage, ranking). Coverage is a measure that may go against accuracy, as a system may provide recommendations of high accuracy, but for a limited set of popular items. Confidence and Trust are discussed as the system trust in its ratings, and the users trust on the system ratings or rankings. Novelty and Serendipity are two similar concepts that measure the capacity of the system to provide items that the user wasn’t aware of; the main difference is that serendipity measures how surprising is this recommendation. Diversity is another measure that goes against accuracy, as a user may like items that are not similar between them. Utility is a measure that depends on the domain, as it records how the recommender system helps the business model. Risk is something that may be associated with a recommender system, as in certain domains we might like to reduce the risk to the minimum for the high cost of a false positive. Robustness measures the stability of the system, especially with respect to malicious users, and Privacy is important to retain the users trust in the system, as they provide personal information but don’t want third parties to have access to it. Adaptivity is explained as an important property to consider, as the relevant items may change with time, for example. Finally, Scalability is presented as the ability to remain functional with the addition of more and more data.

A question that surges on the part about diversity, is that it is explained that diversity may be desired, but as a global measure of the system. Diversity is a very subjective matter, as users may like a different quantity of diversity on their recommendations, and this aspect should be considered.

A usual problem of recommender systems with regard to their adaptivity capacity is that users may don’t like their current recommendations, but as they are recommended the same type of things, they keep consuming them. This enters the user in some kind of loop, where it can’t get off. Sometimes it is desirable for a user to report the explicit success of the system, rather than wait for the adaptability methods to kick off and change the content that is being presented.

Evaluating Recommendation Systems

Guy Shani and Asela Gunawardana

Written by Yoav Navon