RankNet, LambdaRank TensorFlow Implementation — part II

Louis Kit Lung Law
The Startup
Published in
3 min readFeb 3, 2021

In part I, I have go through RankNet which is published by Microsoft in 2005. 2 years after, Microsoft published another paper Learning to Rank with Nonsmooth Cost Functions which introduced a speedup version of RankNet (which I called “Factorised RankNet”) and LambdaRank.

However, before I jump to talk about Factorised RankNet and LambdaRank, I’d like to show you how to implement RankNet using custom training loop in Tensorflow 2.

This is important because Factorised RankNet and LambdaRank cannot be implemented just by Keras API, it is necessary to use low level API like TensorFlow and PyTorch as we will see later.

Custom Training Loop

Remember in part I, the model training is actually done by 2 lines of code

ranknet.compile(...)
ranknet.fit(...)

In order to write our own training loop, we need to understand what is done behind these 2 lines of code, this is outlined in the pseudo code below

training pseudo code

Now we know what need to be done. Let’s break down the whole process into a few functions:

  • apply_gradient function which do line 3 to 6
  • train_data_for_one_epoch function which loop through batches, line 2–7
  • compute_val_loss function which do line 9
  • an outer loop on epochs which put everything together

Let’s see how to implement these.

apply_gradient

  • line 4 computes Pij
  • line 5 computes the loss
  • line 7 gets the gradients of all trainable weights
  • line 8 apply back propagation to update the weights

train_data_for_one_epoch

  • line 5 converts the function apply_gradient to graph mode
  • line 7 print a progress bar
  • line 8 loops through the training set batch by batch
  • line 10 stores the loss of current batch

compute_val_loss

This function simply loop through validation set and compute the loss.

Putting all these together

  • line 3 & 4 define the data pipeline
  • line 8 define the loss function and we use binary crossentropy
  • line 9 define the optimizer
  • line 19 start the training of RankNet

From the results below, we can see the training result is very similar to the one in part I.

first 5 epochs of training
training loss and validation loss plot

Benefit of Custom Training Loop

Now you may be wondering what’s the advantage of using a custom training loop. Let me show you how we could be more memory efficient after using custom training loop.

Recall that in Part I when we generate the data, we first generate doc_features and then transform it into pairs, xi and xj before feeding into RankNet. This means that for one document, it will be repeated several times in the memory.

For example, if we have two queries, q1 and q2:

  • q1 has three documents, d1, d2, d3
  • q2 has four documents d4, d5, d6, d7

Then we will have following pairs

  • q1’s pairs: d1 & d2 | d1 & d3 | d2 & d3
  • q2’s pairs: d4 & d5 | d4 & d6 | d4 & d7 | d5 & d6 | d5 & d7 | d5 & d7

To be more memory efficient, we should only get the necessary document on the fly during training. This could be achieved by modifying two functions as below:

line 8 doesthe trick
line 4 does the trick

Now we get rid of the variables xi, xj, pij completely which means we are not repeating any document in the memory. If you use these functions to train the model, it will work exactly the same!

Conclusion

Now we implemented RankNet using a custom training loop in TensorFlow 2.

In part III, I will talk about how to speed up the training of RankNet and the implementation.

Stay Tuned!

--

--