Countering Popularity Bias in Recommendations

Robin Verachtert
Froomle
Published in
4 min readMar 28, 2023
Photo by Laura Peruchi on Unsplash

Popularity Bias is a common issue for recommender systems, especially for news recommendations.
Most people read the breaking news, however, many are also interested in their own niche interests. Some like football, others basketball, and still others will read everything about the political scene, or movie stars.

If the recommendation algorithm only propagates the breaking news, then users will have a hard time finding articles that satisfy their niche interests.
In most offline experiments, recommending popular items is rewarded by the metric chosen, e.g. NDCG and Recall, two common metrics in recommender systems evaluation.
This disparity can make it easy to mistakenly use non-optimal solutions.

In a series of experiments on three newspapers, we investigated the impact of punishing popular items on fairness metrics such as item-catalogue coverage and the Gini index of the recommendations, as well as the impact on NDCG (offline) and the CTR (online) of the recommendations.

We expected that punishing popularity will increase coverage and lower the Gini coefficient as more items get recommended and items get recommended more evenly.
On NDCG, we expected decreasing performance with increasing alpha, as the popularity bias in the data would value popular recommendations highly.
We had no expectations regarding the CTR, as we had no idea how the users would react to different sets of recommendations.

Experimental Setup

In our tests, we recommended 3 items to a user at the end of an article, the idea being that the user should consider the recommended articles to continue their reading.

As a recommendation algorithm, we used an item-based nearest neighbour algorithm (ItemKNN), which computes the similarity between items using the conditional-probability-inspired similarity function as defined in the original paper describing ItemKNN for recommender systems (Deshpande et al., 2004).

Similarity Function, D_u is the list of items visited by a user u, and alpha is a hyperparameter

Increasing alpha increases the popularity punishing factor of the target item j in the denominator.

Before running our online trials, we also evaluated the algorithm offline, looking at the NDCG@3 for various values of alpha. In total we ran 4 online A/B tests, each comparing a different value of alpha to the control setting of 0.5.

Results

Fairness results of recommendations

Our two chosen fairness metrics show that increasing alpha does indeed lead to a more equal spread in recommendations. At the extreme (alpha = 1), we do notice that coverage goes down slightly, as popular items are now no longer recommended at all.

In terms of coverage, we see a big jump from alpha = 0.0 to 0.2 and 0.5, further increases don’t make it that much better. For the Gini index, we see that the increase to 0.2 has barely any effect, and increasing beyond 0.5 has less effect than going from 0.0 to 0.5, except on the third newspaper.

Recommendation count distribution of each item, sorted from most read to least read

To look at why the fairness metrics behaved this way, we plot the recommendation count for each item, sorted from most read to least read.
In these distributions, we see that popular items remain the most recommended items unless we increase alpha all the way to 1. The recommendations do become more and more balanced as alpha increases to 0.5.
When moving beyond 0.5, we see a new imbalance appear, as unpopular items are recommended more frequently than some of the other items. With alpha = 1, the distribution is flipped, and recommendations are almost always unpopular items.

In the comparison between NDCG and CTR, we clearly see the popularity bias propagated in the offline results, as 0.2 seems to work best. In the CTR results, we see a different story, our control setting of 0.5 performs best, and the more popularity-biased settings show significant drops in performance.

Both do agree on the poor performance for alpha=1, as the randomness introduced by recommending mostly unpopular items, severely limits the quality of the recommendations.

The 0.5 setting strikes a balance between recommending popular articles, and those that are strongly related to the article the user is reading.

Our results indicate that we need to counter popularity bias when building recommendation models, but that users do not want recommendations devoid of popular items. Those items were popular for a reason, as many users are interested in them.
Using the goldilocks parameters for countering popularity helps us increase the relevance of the recommended items for the readers.

Resources

  • Deshpande, Mukund, and George Karypis. “Item-based top-n recommendation algorithms.” ACM Transactions on Information Systems (TOIS) 22.1 (2004): 143–177.
  • Verachtert, Robin, et al. “The Impact of a Popularity Punishing Hyperparameter on ItemKNN Recommendation Performance.” Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part II. Cham: Springer Nature Switzerland, 2023.

--

--

Robin Verachtert
Froomle
Writer for

Research Scientist @Froomle, specialised in Temporal Dynamics for Recommender Systems.