Member-only story
A Complete Tutorial on Off-Policy Evaluation for Recommender Systems
How to reduce the offline-online evaluation gap
Introduction
In this tutorial, I will give some rationales about why one should care about off-policy evaluation. Then I’ll introduce the right amount of theory on the matter. I will not dive too deep into the latest methods and stop on what works well empirically. It will be sufficient for the following part describing the practical aspects of its implementation.
Evaluating a recommender system
Let’s say you have an idea to increase the performance of your recommendation system. Let’s simply say you got access to new metadata about your items, like the main topic of the items, which should really help in predicting user engagement on the platform more accurately. So how do you leverage that new feature? After a rounded exploration phase of the new data, you perform the necessary modifications in your data pipeline to transform the feature and feed it to the model along with the other ones. You can now trigger the learning phase to produce the new version of the model.
Great! Now, how do you evaluate the performance of the new model? How can you conclude whether the new feature helps the recommendation and will drive more…