Precise Retrieval For Tuning Ranking
A previous post on retrieval, precision and recall discussed the distinction between the three important concepts in search. In another post, we described the two main approaches for retrieval, and saw that, for faceted search, retrieval with high precision is the foundation for a good search experience. In other words, if we look at the results in its entirety as a set and not be distracted by the ranking, how relevant is each and every single result. A highly precise retrieval contains very little to no irrelevant results. And by irrelevant, we mean seeing bento boxes and body fat measuring device when we search for salad dressing.
A retrieval with poor precision can happen with an ill-conceived implementation of set retrieval. For example, the use of standard Boolean operators without consideration for the proximity or ordering of the words, or generally, the intent of the query. The same can happen with the ranked retrieval approach, which is probabilistic in nature. Each document is assigned a probability of relevance (between 0 and 1) against a query. Without confidently knowing how far down the ranked retrieval one should go to return as results, irrelevant documents can be retrieved. Figuring out this cut-off point is non-trivial.
At a high level, there are two reasons why having a precise retrieval is important. The first reason is to do with the fact that faceted search allows the users to slice-and-dice to interrogate the results. This has been discussed in detail previously. In this scenario, irrelevant results in the retrieval are easily exposed to the users when they start using the facets. The second reason is, more and more vertical search engines are considering additional signals beyond explicit queries to improve the relevance of top results. For a retrieval that is already far from precise, adjusting the ranking using implicit signals risks surfacing irrelevant results to the users more easily.
In this short post, we will use an example to explain the second point. We will highlight the importance of getting the precision of retrieval to a reasonable level first, especially for vertical search, before implicit signals from other sources are used to tune the ranking.
Imagine we have an index of 8 documents, each with a paragraph or two about the animals below, labelled from A to H. The documents each has the 2 structured fields colour and size, which we offer as facets to users for filtering. One day, a keyword query “cats” came in from a user. Let us assume that the retrieval is not overly great based on some of the potential causes discussed above. The search brings back 6 documents in a descending order based on initial keyword relevance (A, G, H, B, C, E). Animals D and F were not retrieved. If it is not obvious to you yet, 3 of the 6 retrieved results, namely, B, C and E, are not actually cats. In other words, the precision of the retrieval was 3/6 = 50%. The precision of the first 3 results is high at 3/3=100%.
To better predict the intent, the engineers thought it might be a good idea to look at the click logs. They generalise from there the kinds of results that users, who perform similar searches, typically go for. They observed from the logs that for those users who searched for “cats”, they often select the small, brown ones. They are not sure why that is the case, neither do they care. Using that as an additional signal of relevance, they altered the ranking. The retrieved results that have brown and small as the values for the colour and the size fields get boosted. This produced a new ranked result in the following order (A, B, C, G, E, H). As you can see, since animals B and C are both rather small and more brown than the others, they received a boost to rise to position #2 and #3, displacing the cats G and H from the original ranking. Similarly, animal E which was positioned last in the original ranking received a boost and was preferred over H since both were similarly sized but E is more brown.
The precision of the retrieval set remained the same as the implicit signal was just used to influence the ranking. If we look at the first 3 results, only the first one is a cat. The second and third results were elevated to these positions because they were small and brown. The precision@3 has now dropped to 1/3 = 33%.
We hear more and more search products experimenting with implicit signals for ranking to improve the relevance of top results. However, if not balanced correctly with the intent that comes from explicit query, this practice could potentially do more harm than good, as we have seen from the precision@3 in the example above.