What I Learned during 4 Years of Recommender Systems Research

Published in

Froomle

5 min readFeb 21, 2023

I’m currently in the process of writing my thesis, after four years of research into online recommender systems.
Here I’ll try to present the TLDR of my work these past years, and share some of the ideas I still have for future work, hoping that they might be interesting to others.

Summarising 4 years of research

In the past years, I worked as a research scientist at Froomle, where I investigate solutions to problems that arise naturally when we start using recommender systems outside of the controlled environments usually found in research papers and experiments.

One of my proudest accomplishments in these four years is the collaboration with Lien Michiels on a python library for experiments with implicit feedback collaborative filtering algorithms.
We built RecPack to simplify our experiments and share the easy-to-use interface with everyone interested in recommender systems experimentation.
We paid close attention to create a simple and easy-to-use interface, that feels familiar to everyone that has worked with sklearn in the past.
More information on RecPack can be found on the documentation page and our publication that won the “Best Demo” award at Recsys ’22.
In a recently accepted paper “Scheduling on a Budget: Avoiding Stale Recommendations with Timely Updates” in the Machine Learning with Applications journal, we define and compare several methods for scheduling model updates.
Production recommender systems need frequent updating of the recommendation models, to avoid staleness in the recommendations and to account for concept drift in the data stream.
We compare smart schedulers based on the number of events, the unexpectedness of events, and the expected change in the model to the baseline time-based scheduler.
We find that the proposed smart schedulers are able to make a difference, resulting in performance improvement, without the need to increase costs for training models.
In “Are We Forgetting Something? Correctly Evaluate a Recommender System With an Optimal Training Window” published in the PERSPECTIVES ’22 workshop at RecSys ‘22, we look at the impact of temporal dynamics on model quality. We show that the data used to train a recommendation model has a large impact on the quality of the model.
By using only recent data points to train algorithms, we find that popularity becomes an almost unbeatable baseline in offline experiments. Further, we find that other, simple methods, such as ItemKNN benefit strongly from a reduction in training data, and can now outperform more advanced and expensive neural networks.
These findings show that in offline experiments, we need to take care to fully optimise our baselines in order to make fair comparisons.
In the currently unpublished work, “A Unified Framework for Time-Aware Item-Based Neighbourhood Recommendation Methods”, we continue the effort to address concept drift when training models. Except that we don’t want to take the rough and indiscriminate method of ignoring most of the available data.
Instead, we analysed all known methods for ItemKNN that include a time-decay factor and propose a framework that generalises all of these methods and introduces additional, untried combinations.
We find that by applying the right decay, we can indeed improve on the crude forgetting mechanism proposed earlier.
In “The Impact of a Popularity Punishing Parameter on ItemKNN Recommendation Performance”, accepted at the upcoming ECIR ’23 conference, we look at the impact on recommendation quality and user engagement when we show less popular items in a news context.
We analyse the discrepancies between offline and online results and find that the popularity bias present in the data, also causes the evaluation to favour algorithms that recommend more popular items, whereas online results show that users are more likely to click on the recommendations when we punish popular items and show a more balanced list.

From all of these publications, we have learned that online recommender systems are tricky to get right. For the majority of companies looking to use recommender systems, the challenge is not so much in building the best model, but in making sure this model is optimally used.
In many experiments, we found that applying smart tweaks to simple baseline algorithms, can put them ahead of complex methods.

The ability to use simple methods effectively is under-appreciated, as many people tend to just use deep learning or other computationally heavy methods, that should be able to learn more by themselves.
While these methods are often good at learning from large amounts of data, they are also very expensive, and therefore not always usable for companies with less budget than the Googles, Facebooks and Twitters out there.

Looking Forward to the Future

Long Term Interests

In our work, we have focused on the “short’’ term interests of users, and exploiting those, as they are the easiest to efficiently use for recommendations.
The goal of personalisation, though, should be to also use the longer-term history of each user, to provide an even more unique experience tailored to each individual.
Of particular interest in this area, is the recommendation of niche items to users that are interested in a niche topic.
The challenge comes from the fact that these niche interests are usually less represented in the dataset than popular topics, they have fewer articles and finally, there is usually more time between publications of the articles.
All of these factors introduce an extremely sparse pattern, that is hard to surface by traditional collaborative filtering means.
Specific methods to look for these sparse patterns can help news publishers get the right news to the right people.

Efficient algorithms

Most companies using recommender systems have a tight budget to use for serving and training their recommendation models.
So for them, the ability to efficiently compute small models that still get high quality is an important asset.
Recent advances in deep learning often result in algorithms that are high in resources, training time and model size.
Further research into algorithms that are both efficient to compute and can be served with minimal resources will make advances available to small- and medium-sized companies as well as the tech giants that are mostly served right now.

Scheduling

Our scheduling systems were based on heuristics, which showed a correlation with better scheduling moments.
Approaching the scheduling problem from a control systems angle could help further improve scheduling, and allow an even better return on investment for companies in their recommendation models.
We considered, but never completed avenues using reinforcement learning (RL) which seems suitable for application to the scheduling problem.
Using RL, the agent would have to decide on one of two actions (update or no update), and the reward function is a combination of the cost (negative) and performance (positive).