Query Understanding: A Search With Machine Learning Use Case

Published in

Loopio Tech

9 min readMar 1, 2023

Applying Machine Learning (ML) solutions are becoming more and more popular in various industries and the search domain is no exception. Before the rise of ML solutions in the search domain, keyword-based search engines combined with manually defined rules were the state of the art algorithms. Over the last few years, the research community has been active in designing and developing useful ML features in search. From semantic search capabilities to Learning-to-Rank models, the goal is to take advantage of such intelligent solutions to surface the most relevant results to the end user.

As described in our previous article, Loopio is using an offline experimentation framework to evaluate the performance of different solutions including those that are based on ML models. The focus of this article is on one specific use case of applying machine learning capabilities in our search workflow: query understanding. It is the process of analyzing search queries and converting them to enhanced queries that can generate more accurate results.

To have a better understanding of the ways we can improve search results by applying machine learning models, let’s take a look at a typical search system at a high level before our deep dive into the query understanding feature.

Search Workflow

A search engine, at a high level, completes three steps:

Creates an index of documents
Runs a query against that index
Shows ranked results to the user.

There is an opportunity for improvement in every single part of this process. The figure below shows how we can introduce a new feature at different stages of a search system.

Document Pre-processing: Before indexing the documents, we can apply a series of pre-processing operations such as text cleaning, text normalization, and even using machine learning models to identify entities or Part-of-Speech (PoS) tags. This produces enriched documents that will be indexed in the next stage and can help us better match documents with the searched queries.
Modifying Index Settings: We can modify certain settings before indexing the documents. Configuring the similarity setting of our index is one example where we choose the method used for scoring matched documents. Depending on the selected scoring method, the ranking of results will be different.
Query Understanding: It is about what happens before scoring and ranking results. There are different ways to analyze the queries to better represent the intent behind them. Here are some examples of query understanding solutions:
-Synonyms: Being able to extract synonyms of searched keywords can help the search engine to surface more relevant results. We can create these synonyms manually or make use of machine learning models to generate them for us.
-Spelling Correction: Keyword-based search engines cannot return accurate results if the keywords in the searched query are misspelled. So, having a spelling correction layer before sending the request to the search engine can be useful.
-Query Classification: The most impactful form of query understanding is query classification. By predicting the most probable class or category for the searched query, we help the search engine to output more relevant results.
-Results Re-ranking: The last stage of a search system is to re-rank the results. The re-ranking can be done by using different logic such as semantic search. Keyword-based search engines do not understand the context and conceptual meanings of indexed or searched documents. Semantic search solutions can fill this gap by applying the semantics of words and phrases to find the right content.

Query Classification

The main focus of this article is on developing a query classification model and using it to improve search results. To better understand the application of this feature in a search system, let’s work through an example.

Imagine we have a lexical search system that aims to answer specific cooking questions. The input to this search system is cooking-related questions and the output is question-answer pairs that can answer the searched question. Without a query understanding layer in this search system, the end user might receive results that belong to a completely different category. For example, if I’m asking a question that is related to “baking,” I expect to see results closer to this category and not a category like “storage lifetime.” In other words, the query understanding layer helps the search engine to understand the category of the searched query and surface results that belong to the same category.

If we look at the historically asked and answered questions, we can see that each question has at least one associated tag.

We can use such a rich dataset to train a text classification model with question text as input and question tag as a label. Once we have a trained model, we can use it at query time to predict the most relevant category for each asked question and surface those documents with similar labels.

We will use the fastText package to classify cooking questions. This is an open-source library, developed by the Facebook AI research lab. The main advantage of this package is its ability to process large volumes of data quickly and accurately. The performance of this text classification algorithm is impressive in accuracy, training, and testing times.

The following sample code will walk through the model-building process, which encompasses:

Installing fastText
Downloading the Dataset
Preprocessing Question Text
Training and Evaluation
Performance Improvement

We then go through two methods of integrating the trained model into the search workflow

Installing fastText

It is recommended to isolate your python development environment by creating a virtual environment first. We can install fastText in different ways but the preferred method is using make. In a terminal:

>> wget https://github.com/facebookresearch/fastText/archive/v0.9.2.zip
>> unzip v0.9.2.zip
>> cd fastText-0.9.2
# for command line tool
>> make
# for python bindings
>> pip install .

Downloading The Dataset

We are going to use a collection of cooking questions from the stackexchange website. Let’s download these questions and their associated categories:

>> wget https://dl.fbaipublicfiles.com/fasttext/data/cooking.stackexchange.tar.gz && tar xvzf cooking.stackexchange.tar.gz

The generated cooking.stackexchange.txt file has 15,404 lines. Each line represents a label followed by the question text in this format:

__label__Baking Is it normal for key lime pie filling to be bubbly like water and overflow while baking?

The next step is to split the data into training and testing sets.

>> head -n 12404 cooking.stackexchange.txt > train.txt
>> tail -n 3000 cooking.stackexchange.txt > test.txt

Pre-processing Question Texts

The next step is using python programming language to apply simple text preprocessing techniques to improve the performance of our classifier. The code below implements lowercasing, removing punctuations, tokenization, and stemming the input searched query. Once it is done, we can write the normalized content back into separate train and test files.

import string
import nltk
nltk.download('punkt')
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer("english")

def transform_query(query):
    cleaned = ''.join([i for i in query.lower() if i not in string.punctuation])
    tokens = nltk.word_tokenize(cleaned)
    return " ".join([stemmer.stem(token) for token in tokens])

Training and Evaluation

Now that we have our normalized train and test files ready, we can go ahead and train our first classifier using the default setting. We can simply evaluate the performance of this trained model on test data and save the trained model. The output of the test method is the precision at one (P@1) and the recall at one (R@1).

import fasttext

# Training the fastText classifier with default setting
model = fasttext.train_supervised(input='train.txt')

# Evaluating performance on the test set
model.test('test.txt')                      

# Saving the trained model
model.save_model('model.bin')

We can do the same using fastText command line

# train
>> ./fasttext supervised -input train.text -output cooking_model
# test
>> ./fasttext test cooking_model.bin test.txt

Performance Improvement

The classifier that we built so far is based on the default fasttext classification setting. Modifying the default setting can improve the accuracy metrics of our classifier. Here are a few parameters that we can experiment with:

Epoch: By default, fasttext processes each single training sample 5 times (number of epochs). Given the size of this dataset, increasing this parameter can help with training a better classifier (-epoch)
Learning Rate: Changing this parameter changes the learning speed of the algorithm. Good values of the learning rate are in the range of 0.1 to 1.0 (-lr)
Word n-grams: Another way of improving the performance is to use bigrams or trigrams instead of just unigrams. (-wordngrams)

Another useful feature of fastText is model quantization. It is really important to train an accurate classifier, but the real value comes when integrating this model into production. If the memory footprint of our trained model is large, it could be a blocker to using it in production. The main goal of quantization is to reduce the original size of the model without sacrificing a lot of accuracy.

This command creates a .ftz file of the original model.bin, with a smaller memory footprint. All other functionalities of fasttext such as test work with the quantized model.

>> ./fasttext quantize -output model.bin

Integration in Search

The next step after building a text classifier is to integrate this model into a search workflow. Before sending the search request to the search engine, we need to enhance our query with the extra information that we have. In this case, the extra information is the predicted category for each query text.

There are two different ways of influencing search results based on this query classification model: Boosting and Filtering.

Boosting

By using this functionality of elasticsearch, you can mathematically prioritize certain documents as being more relevant than others. We are going to use the output of our trained model as the signal to increase the relevancy of some documents. For example, if the predicted category of the search query: Best way to bake a breakfast casserole is “baking,” we can increase the relevancy of those indexed documents that have “baking” as a category. By doing this, the final ranked documents are more relevant to baking-related questions and this increases the probability of providing the end user with more relevant results. Here is a very simple example of how we can integrate boosting into a search query.

{
    "query": { 
         "bool": { 
            "must": [
                 {
                    "match": {
                     "query_text" : "Best way to bake a breakfast casserole" 
                            }
                },
                {
                     "term": {
                     "category": {"value":"baking","boost": 10}
                            }
                }
  
                    ]
                 }
            }       
}

In this example, the boosting score of 10 will be added to the matching score from the match query for those documents that have the label baking. Therefore, those documents with this label will appear toward the top of the list of results. Please note that this query structure is a simple example of boosting predicted categories and may not be production ready. You need to explore a combination of different query clauses to find the optimal structure for your use case. Also, multiple offline experiments with different boosting values need to be completed to find the optimal value.

Filtering

Filtering restricts the search results to a subset of documents based on a defined filter. This is a more restrictive option compared to boosting. We use the identified category of a searched query as the criterion to filter out documents. So, in our example, those documents that are not labeled as baking will be filtered out in the final list of results. Here is an example of how to apply a filter.

{
    "query": { 
        "bool": { 
            "must": [
                 {
                    "match": {
                    "query_text":   "Best way to bake a breakfast casserole" 
                            }
                }
                     ],
         "filter": [
                    {
                     "term": {
                           "category": "baking"
                        }
                    }
                    ]
                }
            }
}

Depending on the use case and the accuracy level of trained models, you can use either boosting or filtering approaches to integrate this query classification feature into your search workflow. It would be good to run experiments with both methods and find out which one generates more accurate results based on the evaluation metrics discussed in our previous search article.

Conclusion

In this article, I introduced query classification as one use case for applying machine learning capabilities to a search system. By understanding the intent behind a searched query, we can surface more relevant results.

The quality of a machine learning model depends highly on the quality of labeled data and the pre-processing steps before training the model. So, it would be ideal to spend enough time and resources on those steps to be able to train an ML model with a high level of accuracy before integrating it into the search system.

Stay tuned for the third and last volume of this search series blog posts which will be about building a faster indexing pipeline in Elasticsearch using AWS Kinesis and Lambda.