RankNet, LambdaRank TensorFlow Implementation— part I

Louis Kit Lung Law
The Startup
Published in
5 min readFeb 2, 2021

I come across the field of Learning to Rank (LTR) and RankNet, when I was working on a recommendation project. However, it is a bit tricky to implement the model via TensorFlow and I cannot find any detail explanation on the web at all. Hence in this series of blog posts, I’ll go through the papers of both RankNet and LambdaRank in detail and implement the model in TF 2.0.

For this post, I will go through the followings

Learning to Rank

In a typical learning to rank problem setup, there is

  • a list of queries q1, q2, q3, ...
  • for each query, there are some documents d1, d2, d3, …
  • for each document, there is a score s1, s2, s3 … which indicate the relevance between this document and the corresponding query (e.g. score 0 means completely irrelevant while 5 means highly relevant)

For example, in the case of a search engine,

  • queries are search texts like “TensorFlow 2.0 doc”, “Keras api doc”, …
  • documents are the URLs returned by the search engine
  • score is the clicks received by the URL (higher clicks = more relevant)

In the example above, one could construct features as the keywords extracted from the query and the document and label as the relevance score.
Hence the most straight forward way to solve this problem using machine learning is to construct a neural network to predict a score given the keywords.

RankNet

Model Target

Instead of modelling the score of each document one by one, RankNet proposed to model the target probabilities between any two documents (di & dj) of the same query.

And the target probabilities Pij of di and dj is defined as

  • 1 if si > sj
  • 0.5 if si=sj
  • 0 if si < sj

where si and sj is the score of di and dj respectively

Let say for a particular query, there are 3 documents d1, d2, d3 with scores 0, 5, 3 respectively, then there will be 3 valid pairs of documents:

  • d1 & d2 with P12 = 0
  • d1 & d3 with P13 = 0
  • d2 & d3 with P23 = 1

So now each pair of documents serve as one training record to RankNet.

Model Architecture

In the RankNet paper, the author used a neural network formulation.
Let’s denote the neural network as function f, the output of neural network for document i as oi, the features of document i as xi. Hence we have oi = f(xi) and oj = f(xj). Also we define oij = oi - oj = f(xi) - f(xj) = -(oj - oi) = -oji.

Note that oi (and oj) could be any real number, but as mentioned above, RankNet is only modelling the probabilities Pij which is in the range of [0,1].

In order to model the probabilities, logistic function is applied on oij as below:

logistic function applied on oij

And cross entropy cost function is used, so for a pair of documents di and dj, the corresponding cost Cij is computed as below:

And the gradient is computed as

At this point, you may already notice RankNet is a bit different from a typical feedforward neural network.

While a typical neural network follows these steps to update its weights:
read input features -> compute output -> compute cost -> compute gradient -> back propagation

RankNet update its weights as follows:
read input xi -> compute oi -> compute gradients doi/dWk ->
read input xj -> compute oj -> compute gradients doj/dWk ->
compute Pij -> compute gradients using equation (2) & (3) -> back propagation

So in RankNet, xi & xj serve as one training record, RankNet will pass xi & xj through the same the weights (Wk) of the network to get oi & oj before computing the gradient and update its weights.

Implementation using Keras

As described above, RankNet will take two inputs, xi & xj, pass them through the same hidden layers to compute oi & oj, apply sigmoid on oi-oj to get the final probability for a particular pair of documents, di & dj.

Thus, the model should look like this

This could be implemented using keras’s functional API as follows

Now let’s simulate some data and train the model

Now we could start training RankNet() just by two lines of code

first 5 epochs

Let’s plot the loss metric as well

cross entropy loss

As we can see, the loss of both training and test set decreased overtime.

Conclusion

In this post, I have gone through

  • how RankNet used a probabilistic approach to solve learn to rank
  • how to use gradient descent to train the model
  • implementation of RankNet using Keras’s functional API

In the future blog post, I will talk about

  • how to implement a custom training loop (instead of using model.compile and model.fit)
  • how to speed up training of RankNet
  • LambdaRank algorithm

Stay Tuned!

--

--