Learning to Rank — Dogs

What is Learning to Rank?

Here’s the official Wikipedia blurb:

Learning to rank[1] or machine-learned ranking (MLR) is the application of machine learning, typically in the construction of ranking models for information retrieval systems.[2] Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score, or a binary judgment (e.g. “relevant” or “not relevant”) for each item. The ranking model’s purpose is to rank, i.e. produce a permutation of items in new, unseen lists, in a way that is somewhat “similar” to rankings in the training data.

In this article, I’ll explain how Learning to Rank (LTR) works. And since everything is easier to understand with real life examples, I’ll be using the search for my new family dog.

Searching for the perfect dog…

I want to buy a dog — but not just any dog. A dog that will be right for me, my family, and my house. One that doesn’t bark too much, one that doesn’t eat too much, and one that’s great with kids. Oh, and it needs to be low maintenance, with short hair (and the list goes on).

Source: Pets4homes

Applying Ranking to our Dog Search

This should be easy right?

All we need to do is index all the dog breeders and their information into a search index and go from there?

But what is important?

I could manually apply different rankings for my search results using the tried and true method, and there is nothing wrong with this.

However, LTR has the potential to provide an easier and more accurate way for me to find the right pooch.

See below for the manual way (not necessarily the wrong way):

We search on the designated fields in our search index with a simple AND/OR query.

The fields might be:

  • Dog Size
  • Dog Weight
  • Dog Hair
  • Dog Colour
  • Dog Breed
  • Dog Eating Habits
  • Dog Chewer
  • Dog Moulter
  • Dog good with Kids
  • Dog Good with Cats
  • Dog Purebred
  • Dog OK in Car
  • Dog Cost

We might apply a base algorithm such as TFIDF.

We might apply weights to each field for example:

  • Dog Size: 200
  • Dog Hair: 100
  • Dog Chewer: 50
  • Dog Cost: 20

The problem is what is important to me, might not be important to others (especially people in my family).

We need a way of balancing what is important to us with the available dogs and breeders out there in my area.

This is where LTR comes in

For the technically minded here’s the details — One design for LTR:

  • Allows you to store features in the search engine
  • Logs features scores (relevance scores) to create a training set for offline model development
  • Stores linear, xgboost, or ranklib ranking models in search that use features you’ve stored
  • Ranks search results using a stored model
LTR Example Flow

So how does this help me find a dog?

First we build the LTR offline model.

Training Data

We gather all of the information important to us — the dog breeders and all of the dog breeds — and index them into a search index.

Feature Sets

We then designate feature sets that are important to us:

  • Feature 1 Breeder Confidence and reviews
  • Feature 2 Dog Character
  • Feature 3 Other fit for purpose characteristics such as cost, location, etc.

Parsing against fields

We then parse against the previous fields with all our historical doggy review data to determine the ideal algorithm/weightings.

What library do we use to do this?

We need to build our training data. Using a library, we train our historical dog data to determine the relationship between our search queries and documents. We can predict beforehand our desired impact -

For search engine ranking, this translates to a list of results for a query and a relevance rating for each of those results with respect to the query.

Model examples: include RankNet, LambdaRank and LambdaMART

Remember that LTR solves a ranking problem on a list of items. The aim of LTR is to come up with optimal ordering of those items. As such, LTR doesn’t care much about the exact score that each item gets. Rather, it cares more about the relative ordering among all the items.

Once again, for the technically minded check out this great article on medium regarding Learning to Rank models:

Learning to Rank plugins and model kits are also prevalent on Github so check these out if you would like to get your hands dirty and implement your own LTR model:

Now we sort the data

We boost our search queries with the output from our model generation for as many doggy documents as we need, for example 200 reviews. Note: the larger the window size boost from your model, the slower the query may become.

The technique recap:

  1. Create features to train model
  2. Add Features:
    Feature 1 Breeder Confidence and reviews
    Feature 2 Dog Character
    Feature 3 Other fit for purpose characteristics such as cost, location, etc.
  3. Train the offline model 
    Note: This can go horribly wrong if you don’t have enough quality queries to train the model against your desired feature set. The feature set can be minimal, however too few features can cause LTR issues, whilst too many may convolute the model.
  4. Query the model and boost based on the scores coming from the model. This is called re-scoring.
  5. End result — I find the dog that is just right for me based on all the features and criteria I have given the Learning to Rank model. Even better, if we automate our offline model, then we can allow the system to “learn” continuously and adjust rankings accordingly.

Here’s a LTR Search Engine plugin example:

So back to the dogs

After implementing LTR, we can now search for a friendly dog that doesn’t have a history of health complications.

The search will match all of our search index data, however, the results will be re-scored based on the features that I used to train the model. These being primarily:

  • Feature 1 Breeder Confidence and reviews
  • Feature 2 Dog Character
LTR Found the right pooch (Source)

We now have a shortlist of dogs just right for me!

And here they are!

One of these lucky little guys is coming home with us!

Have you been using LTR? I’d be interested to know about your experiences.