Learning to Rank: BPR

Andrés Espinosa
3 min readNov 13, 2017

--

Today I’m talking about BPR: Bayesian personalized ranking from implicit feedback by Steffen Rendle, Christoph Freudenthaler, Zeno Gantner and Lars Schmidt-Thieme.

They start mentioning a bunch of existing methods for the task of item recommendation. Nevertheless, in words of the authors “none of these methods directly optimize its model parameters for ranking. Instead they optimize to predict if an item is selected by a user or not”. There are some methods that consider the optimization for ranking, but they are non-personalized, which means that they give the same recommendation for all users.

Some important things to consider about the authors’ method:

  • They assume that the user prefers the positive item over all other non-observed items.
  • The training data is constructed by tuples in the form (u, i, j), which represent that user u prefers item i over j.
  • The above allows the usage of both positive and missing values, while other methods have trouble taking the missing values as negative ones.
  • Another thing to notice is that the training data is created for the actual objective of ranking, which is one of if not the most important contributions of the authors’ method.

BPR

Then they explain the BPR method itself. They start with the BPR-OPT formula, which is the one that needs to be optimized. It is formulated by trying to maximize the likelihood that a user prefers item i to item j. After that the authors mention an analogy between their method and the AUC optimization method. And finally they explain the Learn-BPR method, which is used to optimize BPR-OPT efficiently. They simply use the stochastic gradient method but taking the samples randomly and with replacement.

Application

The authors use BPR with two different existing methods: MF and KNN. It’s worth noting that because BPR is a loss function and optimization method, it can potentially be applied to any method that tries to minimize a loss function (although it only makes sense for ranking methods). The MF method (before BPR) uses least square or other methods to find the best approximation of the predicted X to the real X value, where X is the rating of the items that would be used to rank them. In the same way, the KNN method uses a similarity measure C that can be learnt. In both cases, the authors use BPR instead of the loss function used in each method. They also have to add some model specific regularization parameters.

Other methods

Two other methods are described, with their corresponding disadvantages in regard of BPR.

  • WR-MF: It is a method for implicit feedback, but the “optimization is on instance level (one item) instead of pair level (two items) as BPR”. I have already mentioned that this is one of the major advantages of BPR. Also, their optimization is best suited for regression, but it should be for classification (for the task of item prediction).
  • MMMF: It’s disadvantages are that it is specific to MF and is designed to work with sparse data.

Evaluation and conclusion

The method was evaluated and compared to state-of-the-art methods using AUC as the metric for evaluation.

Figure 1. AUC for the evaluated methods.

As you can see in Figure 1, BPR outperformed every single method. So the biggest conclusion they make is that “the prediction quality does not only depend on the model but also largely on the optimization criterion. Both our theoretical and empirical results indicate that the BPR optimization method is the right choice for the important task of personalized ranking”.

What do I say?

I liked the paper a lot. It is easy to read and very well explained. Also, the method itself had a big impact in item-based recommendations. Due to the modularity of many of the existing recommendation methods, it is possible to combine works from different papers. In this case, the authors worked just in a new loss function instead of a whole model, but loss functions are used in many existing models. In my opinion, this makes BPR a very clean and useful method embedded in a great paper.

--

--