RankNet, LambdaRank TensorFlow Implementation— part I

Published in

The Startup

5 min readFeb 2, 2021

I come across the field of Learning to Rank (LTR) and RankNet, when I was working on a recommendation project. However, it is a bit tricky to implement the model via TensorFlow and I cannot find any detail explanation on the web at all. Hence in this series of blog posts, I’ll go through the papers of both RankNet and LambdaRank in detail and implement the model in TF 2.0.

For this post, I will go through the followings

the paper which first proposed RankNet (Learning to Rank using Gradient Descent)
the paper summarised RankNet, LambdaRank (From RankNet to LambdaRank to LambdaMart)
implementation of RankNet using Keras’s Functional API

Learning to Rank

In a typical learning to rank problem setup, there is

a list of queries q1, q2, q3, ...
for each query, there are some documents d1, d2, d3, …
for each document, there is a score s1, s2, s3 … which indicate the relevance between this document and the corresponding query (e.g. score 0 means completely irrelevant while 5 means highly relevant)

For example, in the case of a search engine,

queries are search texts like “TensorFlow 2.0 doc”, “Keras api doc”, …
documents are the URLs returned by the search engine
score is the clicks received by the URL (higher clicks = more relevant)

In the example above, one could construct features as the keywords extracted from the query and the document and label as the relevance score.
Hence the most straight forward way to solve this problem using machine learning is to construct a neural network to predict a score given the keywords.

RankNet

Model Target

Instead of modelling the score of each document one by one, RankNet proposed to model the target probabilities between any two documents (di & dj) of the same query.

And the target probabilities Pij of di and dj is defined as

1 if si > sj
0.5 if si=sj
0 if si < sj

where si and sj is the score of di and dj respectively

Let say for a particular query, there are 3 documents d1, d2, d3 with scores 0, 5, 3 respectively, then there will be 3 valid pairs of documents:

d1 & d2 with P12 = 0
d1 & d3 with P13 = 0
d2 & d3 with P23 = 1

So now each pair of documents serve as one training record to RankNet.

Model Architecture

In the RankNet paper, the author used a neural network formulation.
Let’s denote the neural network as function f, the output of neural network for document i as oi, the features of document i as xi. Hence we have oi = f(xi) and oj = f(xj). Also we define oij = oi - oj = f(xi) - f(xj) = -(oj - oi) = -oji.

Note that oi (and oj) could be any real number, but as mentioned above, RankNet is only modelling the probabilities Pij which is in the range of [0,1].

In order to model the probabilities, logistic function is applied on oij as below:

And cross entropy cost function is used, so for a pair of documents di and dj, the corresponding cost Cij is computed as below:

And the gradient is computed as

At this point, you may already notice RankNet is a bit different from a typical feedforward neural network.

While a typical neural network follows these steps to update its weights:
read input features -> compute output -> compute cost -> compute gradient -> back propagation
RankNet update its weights as follows:
read input xi -> compute oi -> compute gradients doi/dWk ->
read input xj -> compute oj -> compute gradients doj/dWk ->
compute Pij -> compute gradients using equation (2) & (3) -> back propagation

So in RankNet, xi & xj serve as one training record, RankNet will pass xi & xj through the same the weights (Wk) of the network to get oi & oj before computing the gradient and update its weights.

Implementation using Keras

As described above, RankNet will take two inputs, xi & xj, pass them through the same hidden layers to compute oi & oj, apply sigmoid on oi-oj to get the final probability for a particular pair of documents, di & dj.

Thus, the model should look like this