Learning to Rank — Dogs

What is Learning to Rank?

Published in

SEEK blog

6 min readSep 13, 2018

Here’s the official Wikipedia blurb:

Learning to rank[1] or machine-learned ranking (MLR) is the application of machine learning, typically in the construction of ranking models for information retrieval systems.[2] Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score, or a binary judgment (e.g. “relevant” or “not relevant”) for each item. The ranking model’s purpose is to rank, i.e. produce a permutation of items in new, unseen lists, in a way that is somewhat “similar” to rankings in the training data.

In this article, I’ll explain how Learning to Rank (LTR) works. And since everything is easier to understand with real life examples, I’ll be using the search for my new family dog.

Searching for the perfect dog…

I want to buy a dog — but not just any dog. A dog that will be right for me, my family, and my house. One that doesn’t bark too much, one that doesn’t eat too much, and one that’s great with kids. Oh, and it needs to be low maintenance, with short hair (and the list goes on).

Applying Ranking to our Dog Search

This should be easy right?

All we need to do is index all the dog breeders and their information into a search index and go from there?

But what is important?

I could manually apply different rankings for my search results using the tried and true method, and there is nothing wrong with this.

However, LTR has the potential to provide an easier and more accurate way for me to find the right pooch.

See below for the manual way (not necessarily the wrong way):

We search on the designated fields in our search index with a simple AND/OR query.

The fields might be:

Dog Size
Dog Weight
Dog Hair
Dog Colour
Dog Breed
Dog Eating Habits
Dog Chewer
Dog Moulter
Dog good with Kids
Dog Good with Cats
Dog Purebred
Dog OK in Car
Dog Cost

We might apply a base algorithm such as TFIDF.

We might apply weights to each field for example:

Dog Size: 200
Dog Hair: 100
Dog Chewer: 50
Dog Cost: 20
Etc.

The problem is what is important to me, might not be important to others (especially people in my family).

We need a way of balancing what is important to us with the available dogs and breeders out there in my area.

This is where LTR comes in

For the technically minded here’s the details — One design for LTR:

Allows you to store features in the search engine
Logs features scores (relevance scores) to create a training set for offline model development
Stores linear, xgboost, or ranklib ranking models in search that use features you’ve stored
Ranks search results using a stored model

So how does this help me find a dog?

First we build the LTR offline model.

Training Data

We gather all of the information important to us — the dog breeders and all of the dog breeds — and index them into a search index.

Feature Sets

We then designate feature sets that are important to us:

Feature 1 Breeder Confidence and reviews
Feature 2 Dog Character
Feature 3 Other fit for purpose characteristics such as cost, location, etc.

Parsing against fields

We then parse against the previous fields with all our historical doggy review data to determine the ideal algorithm/weightings.

What library do we use to do this?

We need to build our training data. Using a library, we train our historical dog data to determine the relationship between our search queries and documents. We can predict beforehand our desired impact -

For search engine ranking, this translates to a list of results for a query and a relevance rating for each of those results with respect to the query.

Model examples: include RankNet, LambdaRank and LambdaMART

Remember that LTR solves a ranking problem on a list of items. The aim of LTR is to come up with optimal ordering of those items. As such, LTR doesn’t care much about the exact score that each item gets. Rather, it cares more about the relative ordering among all the items.

Once again, for the technically minded check out this great article on medium regarding Learning to Rank models:

Intuitive explanation of Learning to Rank (and RankNet, LambdaRank and LambdaMART)

RankNet, LambdaRank and LambdaMART are all what we call Learning to Rank algorithms.

medium.com

Learning to Rank plugins and model kits are also prevalent on Github so check these out if you would like to get your hands dirty and implement your own LTR model:

Build software better, together

GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over…

github.com

Build software better, together

GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over…

github.com

o19s/elasticsearch-learning-to-rank

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch …

github.com

shiba24/learning2rank

Learning to rank with neuralnet - RankNet and ListNet - shiba24/learning2rank

github.com

jma127/pyltr

Python learning to rank (LTR) toolkit. Contribute to jma127/pyltr development by creating an account on GitHub.

github.com

Now we sort the data

We boost our search queries with the output from our model generation for as many doggy documents as we need, for example 200 reviews. Note: the larger the window size boost from your model, the slower the query may become.

The technique recap:

Create features to train model
Add Features:
Feature 1 Breeder Confidence and reviews
Feature 2 Dog Character
Feature 3 Other fit for purpose characteristics such as cost, location, etc.
Train the offline model
Note: This can go horribly wrong if you don’t have enough quality queries to train the model against your desired feature set. The feature set can be minimal, however too few features can cause LTR issues, whilst too many may convolute the model.
Query the model and boost based on the scores coming from the model. This is called re-scoring.
End result — I find the dog that is just right for me based on all the features and criteria I have given the Learning to Rank model. Even better, if we automate our offline model, then we can allow the system to “learn” continuously and adjust rankings accordingly.

Here’s a LTR Search Engine plugin example:

o19s/elasticsearch-learning-to-rank

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch …

github.com

So back to the dogs

After implementing LTR, we can now search for a friendly dog that doesn’t have a history of health complications.

The search will match all of our search index data, however, the results will be re-scored based on the features that I used to train the model. These being primarily:

Feature 1 Breeder Confidence and reviews
Feature 2 Dog Character

We now have a shortlist of dogs just right for me!

And here they are!

One of these lucky little guys is coming home with us!

Have you been using LTR? I’d be interested to know about your experiences.

Learning to Rank — Dogs

What is Learning to Rank?

Searching for the perfect dog…

Applying Ranking to our Dog Search

This is where LTR comes in

So how does this help me find a dog?

Intuitive explanation of Learning to Rank (and RankNet, LambdaRank and LambdaMART)

RankNet, LambdaRank and LambdaMART are all what we call Learning to Rank algorithms.

Build software better, together

GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over…

Build software better, together

GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over…

o19s/elasticsearch-learning-to-rank

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch …

shiba24/learning2rank

Learning to rank with neuralnet - RankNet and ListNet - shiba24/learning2rank

jma127/pyltr

Python learning to rank (LTR) toolkit. Contribute to jma127/pyltr development by creating an account on GitHub.

Now we sort the data

o19s/elasticsearch-learning-to-rank

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch …

So back to the dogs

Written by Graham Horne