Analysing “Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches

Odunayo Ogundepo
Analytics Vidhya
Published in
2 min readMay 7, 2021
https://www.slideshare.net/AlanInWV/making-progress-2505638

Paper: Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches
Authors:
Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach
Code:
https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation?utm_source=catalyzex.com

AI research is mostly geared towards creating models and algorithms that outperform previous models to achieve “State of the Art” status. Lots of researchers base their work on previous research and try to improve the results using different methods. This has resulted in the immense progress seen by deep learning in recent years; however, this paper highlights the problems associated with basing new research on previous “flawed research”. Flawed research in this context refers to research work that is difficult to replicate or is evaluated against weak baselines selected by the author/researcher.

This paper points out a problem that is synonymous with research in different fields, and that is a reproducibility crisis. It shows a systemic problem with research where there is an incentive to publish papers but not data, code or enough information to help other people successfully replicate the experiment. This means that lots of research end up as just published papers as they cannot be extended into the industry to tackle real-life problems. Out of 18 top-rated recommendation algorithms selected for evaluation for this experiment, only 46% of them were reproducible because the artefacts (data, code, models e.t.c ) of the research were unavailable or the documentation was incomplete. The authors were not available to provide clarity on the problems encountered. Also, for a lot of research work being published, authors tend to select complex neural algorithms baselines while ignoring classical machine learning algorithms like KNN( K Nearest Neighbours), graph-based models e.t.c.

When comparing reported results with other baselines, the author reproduced the work using the same conditions, hyperparameters reported in the papers. In the same conditions, only one of the models “Mult-VAE” matched the results of less complex machine learning algorithms. This is because researchers focus on outperforming some selected baseline in specific experimental conditions. The results from this paper also proves that some researchers use inappropriate methods to arrive at their results; for example the MCRec (Metapath based Context for RECommendation) and NCF (Neural Collaborative Filtering) papers used the test set in tuning hyperparameters.

Some of these problems can be reduced with better research practices like using public repositories to store code when possible, using virtualisation to manage packages and dependencies, providing enough documentation to justify evaluation methods, and adding simple algorithms baselines.

Reference

Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches -https://arxiv.org/abs/1907.06902

--

--