Seeking after the Netflix’s Prize

Andrés Espinosa
Aug 28, 2017 · 2 min read

Today I’m writing a review of Matrix Factorization Techniques for Recommender Systems by Yehuda Koren, Robert Bell and Chris Volinky.

The authors wrote it while competing in the Netflix’s Prize, after winning the Progress Prize ($50,000) twice. In the paper they talk about the techniques that they used to win those.

The paper starts with a brief explanation of Content and Collaborative filtering, and even compares some aspects between them. Then it introduces Matrix factorization, which is a type of Collaborative filtering technique. Matrix factorization methods are popular because of their good scalability with predictive accuracy, but also because it allows the incorporation of additional information. While other methods rely on explicit feedback (ratings given by the users), Matrix factorization may also use implicit feedback or a temporal feature.

I’ll give you some points of what they say in order to explain how does the method work:

  • Regularization, which is a way of preventing overfitting.
  • Two different learning algorithms than can be used and when to use them.
  • Biases: an increment or reduction of the predicted rating due to a user that tend to rate lower or higher than the average, or items that tend to be rated in the same way.
  • How to include additional input sources in the equation to be optimized. They talk specifically about implicit feedback.
  • Temporal dynamics: How time is important at the moment of predicting the rating, and how to take it into account in the equation.
  • How to get a confidence level of the prediction, to know how certain it is, or how probably is it right.

What I liked about the article was that it is capable of summarizing very well several points of Matrix factorization’s operation. Personally I liked a lot the comparison between the two possible learning algorithms that are mentioned: A stochastic gradient descent implementation popularized by Simon Funk, and Alternating least squares. The first one is easier to implement but the second one could be convenient in the following cases: When the system can use parallelization and when the system is centered on implicit data. Just that simple, after a few sentences you know the benefits of both and hopefully when to use each one.

Another concept that is clear to me now is how to incorporate new information into the model, one of the hottest features of Matrix factorization. The equation that the model tries to minimize grows through the paper and one example by feature is given, so you can get the notion of how the new information should be incorporated.

The only thing I missed was a good explanation of how positioning users and items in the latent space. They give a notion but I think it is a topic is worth giving a more detailed view of.

In general, a great paper to learn the basics about Matrix factorization.

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade