Understanding the Meaning of Searches

6 min readJan 16, 2017

In previous posts I’ve talked about how to automatically re-order search results based on monitoring user interactions with your search system and how this has led to improved search results quality. Another technique we have used at Wattpad to improve our performance is to detect the type of search entered by the user, and based on this process, select which query is subsequently sent to Elasticsearch based on this.

As in past posts, the search system allows people to enter a few key words to express their intent. Behind the scenes, this is translated into a query, which is sent to Elasticsearch. The majority of our users employ search to find stories they want to read. However, people will use different descriptions of what they want at different times. For example, “historical romance” and “the necromancer” feel like two different types of searches (genre versus a specific story) which should be handled differently.

To illustrate how this is done let’s start with a basic example where we have a “stories” index with three fields:

{
  "stories": {
    "mappings": {
      "story": {
        "properties": {
          "tags": {
            "type": "text"
          },
          "description": {
            "type": "text"
          },
          "title": {
            "type": "text"
          }
        }
      }
    }
  }
}

We can query this index in the following way:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "tags": {
              "query": "the necromancer"
            }
          }
        },
        {
          "match": {
            "description": {
              "query": "the necromancer"
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "the necromancer"
            }
          }
        }
      ]
    }
  }
}

This will find any stories that have the search terms anywhere in the list of tags, description, or title. However, with this query we are saying that all fields have equal importance, which intuitely seems wrong. This problem is easy to fix as we can assign a boost parameter to each of these fields, for example a boost of 3 on the title, 2 on the tags, and 1 on the description seems resonable. These numbers are guesses, we can also use a more structured approach to tune these boost parameters including running A/B tests on different sets of parameters or learning the parameters by treating this as an optimization problem.

Two Types of Searches

One problem that remains is the limitation to one particular set of parameters. The issue with this constraint is that we are assume all users engage with and understand the search system the same way, all the time. Let’s return to our example queries:

“the necromancer”

“historical romance”

The first query looks like it could be for a specific story, the second example could also be for a specific story but looks like it is more likely for a genre or sub-genre of stories. There is no best set of weight we could choose that would provide the best experience for both of these queries. One option is to look at what the more popular type of query is and tune your search system so that it satistifies most people most of the time, however if you stick to a single set of parameters you will not be able to return the best set of results all the time.

Instead of having a single query for every search you could instead choose which query to use based on the initial search the user enters. So for example we could have two queries one for searches that look like they are for specific stories. We could have a second query for searches that look like they are for genres or sub-genres where the person is likely to be happy with any story that is in that genre and not just on specific story. In the first scenario the person just wants to see the story as the first result and there is one correct search result. In the second scenario the person may want to browse though a few results and there are possibly many ideal search results.

With two types of queries we can now optimize the boost factors for each type and, depending on what the user types in, choose the best set of parameters. In order to optimize search results we will still require the ability to detect which type of search the user enters in order to build the best query to send to Elasticsearch.

Detecting Search Types

To detect the query type we use a hybrid of two main techniques. Both techniques look at the query log which tracks what queries people enter and what they end up clicking on. The first technique looks at the position of search results that users typically interact with, and the second technique looks at the content of the items in the search results that show high interaction.

We have two types of searches that people generally make: one type that indicates the person is looking for a very specific result; and the other type is a more generic, topical query, where the person would be satistifed with any of several results (genre search). We can look at the position users are typically clicking in the search results for two different searches:

In the above chart, the majority of users are clicking on the same results, showing a high level of agreement on what the best search result is.

On the other hand, here we see that there is much less agreement on what the correct search result is, and users are clicking on many different search results.

When we look at the distribution of clicks across search results we can detect the search type. For searches where most users click on a few results, we send a query with parameters tuned for a specific story search. For searches where people are clicking on many different search results we send a query with parameters tuned for a genre search.

The second technique looks at the initial search and the content of what the user ended up clicking on. If for example when most people search “the necromancer” they click on results with the search in the title it is probably because of the title that they clicked on that story, on the other hand if most people click on results with the search not in the title but in the description or tags it was probably because of those fields. By matching the search entered by the user with the content in different fields we can detect which type of query we should send to Elasticsearch. In the case of a lot of matches on the title in is probably a specific search, in the other case the search is probably a more general genre search.

We can combine the interaction approach that looks at the distribution of clicks and the content similarity approach to more confidently detect the type of search and send the best query to Elasticsearch.

Beyond Stories

The example described above of detecting general genre searches or specific story searches is very domain specific, however the general approach works well in any search system that has to deal with users with different search needs. For example in a more general web search system one way of organizing different types of searches is to group them into informational (find an answer to a question), navigation (find a specific website), and transactional (purchase, download, or get something) types. There is an overview of this type of classification of searches by Benard Jansen, Danielle Booth, and Amanda Spink.

Adriel

Originally published at engineering.wattpad.com.

Understanding the Meaning of Searches

Two Types of Searches

Detecting Search Types

Beyond Stories

Written by Adriel