Faceted Search Needs Precise Retrieval

Wilson Wong
Jun 2 · 8 min read

We have spoken about the fact that relevance is subjective. However, this does not mean that lines need not be drawn. In the area of search, there are generally two approaches of bringing results back given a user query.

The first one, which partitions the entire collection into relevant versus irrelevant documents, is known as set retrieval. If the system subsequently decides to order the relevant documents that it returns to you, it should be for the purpose of distinguishing the more relevant results from the less relevant ones.

“relevance ranking should help distinguish more relevant results from less relevant results, rather than distinguishing relevant results from irrelevant results” [1]

On the other side of the fence, many search systems nowadays avoid making that cut-throat relevance distinction. It is done either by design such as the highly tuned Web search systems Google and Bing, or unknowingly as a result of blind adoption of off-the-shelf search solutions.

This second approach, known as ranked retrieval, is designed to only return an ordering of top documents from their collections which are relevant to a query. With this approach, the most relevant results appear first followed by the less relevant ones. If one is not careful with the definition of ‘top documents’, irrelevant results can be included in the retrieval.

In this article, we explore the widely popular approach of ranked retrieval and the impact of its misapplication in vertical search products. Vertical search is different from Web search in that the former serves specific segments, media types or topics of online content. Some examples include restaurant reviews, job advertisements and recipes. We examine some e-commerce examples to highlight the importance of set retrieval in vertical search where the results are often heavily interrogated by the users through filtering, re-sorting and tiering.

When ranked retrieval is the sensible option

There is no other collection of documents like the Web. We all know that it is very large. More importantly, the structure, content, topic, authorship and quality vary greatly between webpages. Coupled with the ambiguity and the wide variety of intent that comes from users, deciding if a document is irrelevant given a query is mostly impossible.

Photo by Jonathan Ybema on Unsplash

If someone searches for “salad dressing” on the Web, what does a relevant document look like? Do we even know if that person is using the phrase in the grocery items context? If so, is the person looking for recipes, flavours, the brands or products available in the market and their prices, the nutrition facts and so on. Let’s not forget that users can also use the phrase “salad dressing” to look for local businesses offering or making the products.

If a good search experience is not predicated on a clear distinction between relevant versus irrelevant, then there is no need for set retrieval. Moreover, if the distinction cannot be made, this provides for an even stronger preference for ranked retrieval over set retrieval. This is exactly the case for Web search.

When set retrieval is the preferred approach

Vertical search is different from Web search in two distinctive ways. First, intent is far less fuzzy and content is more specialised in vertical search. As a result, the users have a less fluid definition of relevance. Unlike Web search, the same two words “salad dressing” used for searching on an e-commerce grocery site has a very exact intent, which is the food item. You would be rather disappointed if you are shown things like microwave safe bento boxes or condiment dispensers when you are clearly expecting the actual dressing.

Photo by Markus Winkler on Unsplash

Second, users are often given or expect the ability to interrogate vertical search results. For instance, an e-commerce site would offer brands, colours and price as facets that users can use to filter. This need to cut and re-order the results in different ways goes against the fact that the default order from ranked retrieval is necessary to push the less or even irrelevant results to the bottom. For vertical search systems which allow the users to interfere with this order, the chances of irrelevant results coming to the surface become very high. This in turns harm the search experience.

If interrogation of search results is fundamental to the search experience, then the search system has to take a firmer stance on reducing or eliminating altogether irrelevant results from the retrieval. The trick of burying potentially irrelevant results in the latter pages of the result set does not work in this type of applications.

10 pages of salad dressing options or not

Let us look at an e-commerce scenario to illustrate the impact of not being clearer cut with the relevance of retrieved documents. We will look at how re-sorting and filtering can easily bring up irrelevant results in vertical search.

I am looking to get some fancy salad dressing for my shopping cart. I typed in “salad dressing” in the keyword input. It tells me that there are over 3000 options of salad dressing. I was quite amazed, thinking to myself that I probably would not find that many varieties of salad dressing in my local supermarket. But then again, we are browsing the inventory of one of the largest e-commerce sites in the world.

Page 1 of search results for “salad dressing” with default sort order and no category filter

I started looking at the results. The first few results are already quite mixed in terms of the relevance (as visible in the diagram above). Mostly salad dressing shaker and some actual salad dressing.

Page 1 of search results for “salad dressing” without category filter but re-sorted by “Price High to Low”

The scrutiny aside, I focused back on my shopping task. I need to find a fancy bottle of salad dressing. I naturally went for the “Sort by” feature and chose the “Price: High to Low” option. I was expecting to see the most expensive salad dressing in the inventory but instead, I saw bento boxes, salad tongs and mixing bowls. If the results we saw initially with the default “Sort by” was not too good, the results now after re-sorting by price were even worse. The initial results sorted by presumably relevance at least still have some actual salad dressing products. The same cannot be said for the price re-sorted results.

I thought I was quite clear with the keywords that I use. But then again, these items are slightly related to what I am after, to be fair.

Page 1 of search results for “salad dressing” with the “Pantry Food & Drinks” department as filter re-sorted by “Price: High to Low”

I paused for a bit and think of ways that I can get the outcome I want. I was immediately drawn to the facets on the left. I thought to myself, “Silly me, I should have selected the Pantry department” to refine my search results to try to remove the non-edible items. I did exactly that and this time, I get only pantry items. The number of results reduced from over 3,000 to 234, which is a more realistic number for salad dressing. However, the number is still high and the top results are still a mixed bag in terms of the relevancy. Instead of salad dressing, I get a wide variety of MCT oil products, which I have got no idea what they are. Out of curiosity, I looked up what MCT oil is and learned that it stands for “medium-chain triglyceride”, which apparently is good for health, but I digress.

I need to watch my weight, I was told

At this stage, I still haven’t quite found what I wanted. I explored further and stumbled upon the department options in the left panel again. I noticed the “Salad Dressings” department and clicked on it, hoping that I will find what I needed this time. There are only 47 items in the result set after drilling in to this level. Finally after all the digging around, most of the items at the top of the results are now salad dressings, even when sorted by price.

Page 1 of search results for “salad dressing” with the “Salad Dressings” department filter applied and re-sorted by “Price: High to Low”

Despite the good relevancy of the results now, one still can’t help but notice the non-edible item that was included in the top row — a device for measuring body fat level, for a search for “salad dressing”.

What went wrong?

Clearly, the unfortunate encounters with irrelevant product items would not have occurred if they were not returned in the first place. We know that relevance is subjective. However, being presented with body fat measuring device, reusable bento boxes and so on when I look for salad dressing is simply weird.

There are only two possibilities as to why the unrelated products were returned. The first is that the set retrieval approach used was just not good at telling apart the relevant from irrelevant. The second is that the ranked retrieval approach was used. My guess is that the latter is true.

A good search experience using ranked retrieval is predicated on maintaining the default relevance order. This ensures that the less relevant and more importantly, the irrelevant results are buried in the latter pages where users are less likely to reach. However, by offering the facets such as the department filter, and the re-sort options in the search interface, we are allowing the users to interfere with the order. This was what happened with the examples we saw earlier.

All in all, combining faceted search with ranked retrieval can yield undesirable search experience. It is important to note that just because successful Web search engines use a certain retrieval approach, it does not mean that it will be effective for everyone. Web search and vertical search deal with very different content, needs and tasks. In this article, we saw that ranked retrieval, which works very well for Web search engines, actually deliver very different outcomes for faceted, vertical search applications.

Pragmatic AI League

Pushing the boundary of practical approaches to content and item discovery using AI

Wilson Wong

Written by

I’m a professional data + product leader trained in comp + info science. I code, write, take photos for fun. https://wilsonwong.co

Pragmatic AI League

The how-tos, pitfalls and other considerations with using machine learning, big data and cutting-edge approaches to create smarter search and recommendation engines in real world applications

Wilson Wong

Written by

I’m a professional data + product leader trained in comp + info science. I code, write, take photos for fun. https://wilsonwong.co

Pragmatic AI League

The how-tos, pitfalls and other considerations with using machine learning, big data and cutting-edge approaches to create smarter search and recommendation engines in real world applications

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store