Category Prediction for Search Query Understanding

Published in

Myntra Engineering

6 min readApr 21, 2024

Navigating through online shopping platforms can sometimes feel like finding your way through a maze. Take the search bar, for example. You type in “winter upper wear,” hoping to find the perfect jacket or cozy sweatshirt. But here’s the tricky part: the search engine has to decipher what you mean. Is it jackets you’re after? Or maybe sweatshirts? Or both?

It gets even more confusing when you consider the overlapping categories. Kurtas can be standalone articles or part of kurta sets. And loafers? They could belong to formal shoes or casual shoes and certainly not sports shoes. See the challenge?

To tackle this, Myntra uses a multi-label search to product category classification model. It’s like having an assistant that can understand possible intents from your search query. So when you type in something like “whey,” the model knows you might be looking for protein or health supplements.

But here’s the catch: search queries can be short and vague, and they often use words that don’t directly match category names. People might search using different terms or even regional variations. So, the model needs to be clever enough to map those words to the right categories internally.

The goal is to capture all possible intents without cluttering your search results with irrelevant stuff. After all, nobody likes sifting through pages of irrelevant products. It’s a delicate balance between covering all bases and keeping things tidy.

Solution

The solution has 2 major components.

I. Data preparation
We prepare ( search query : categories ) data points to be consumed in training by the neural classifier. Ex. ( ethnic wear : kurta, sarees )

II. Training a neural model
We train a neural multi-label text classifier that consumes the prepared training data which is used to predicts categories for search queries live.

I. Data Preparation

We generate the supervised text classification training data in form of a search query and its product category(s) as its labels.

This set is enriched from multiple sources such as

A. Reformulation

Users fire multiple queries with converging intents in a single session. We stitch these queries and respective interacted products within a single search session to help create scenarios for tail queries. Ex. in a single session user fires 2 queries

User Query 1 : “winter upper wear” > Irrelevant results, low user interaction

User Query 2: “Fleece jackets” > User interacts with jackets in search results.

We stitch “winter upper wear” too with jackets in our training set.

< Query : winter upper wear || Label : <Jackets> >

Solving queries using reformulation signals

B. Catalog

We create multiple combinations from products’ attributes and coupled them with the products’ category(s). Ex.

Product categories having the hygienic feature value as “Almond” are stitched together with “Hair Oil”, “Food Oil”, “Chocolates”. Giving training data as :
< Query : Almond || Label : <Hair Oil, Food Oil, Chocolates> >
Using brand + category + gender combinations we instances like
a. < Query : Nike sports shoes men || Label : <Sports Shoes> >
b. < Query : Garnier facewash men || Label : <Facewash> >

This helps us inject data for low traffic categories where reformulation lacks coverage.

II. Neural Classifier Architecture

The model has two parts.

The first part, called the word embedding module, turns words into numerical representations called word vectors. Word embedding essentially helps the computer understand the context and relationships between words by representing them as vectors in a multi-dimensional space. This helps the program understand the meaning of words.

The second part is a text classification model that uses neural networks to understand the order of words in a sentence and make predictions based on that.

A. Word Embeddings

We create word level embeddings using unsupervised representation learning in fasttext on a custom e-commerce corpus created using catalog products, clickstream and commercially available ecommerce data sources. The embeddings’ model learns high dimensional word level representations.

FastText model has following features which make it suitable for our application

Subword Information

Breaks words into character n-grams, aiding in handling out-of-vocabulary or noisy spelling queries. Such as “kurta” / “kurti”. The semantic understanding of “kurt” might imply a generic ethnic topwear while the “rta” or “rti” will map it to the long length “kurtas” and shorter “kurtis” respectively.

This further allows representation of vernacular, unseen or rare words by composing their subword embeddings, enhancing robustness. Ex. “necklace” and “necklas”.

2. Skip-gram Model

The skip-gram model represents words using character n-grams. By predicting context words (the words around the current word) from current word during training and adjusting input vector weights through backpropagation, the model learns semantic associations efficiently. For more details checkout this detailed guide on fasttext.

Learning representation for “loves” by predicting context words “the man — his son”. src

3. Advantages for OOV and Noisy Spelling

Subword embeddings accommodate unseen or misspelled words, providing meaningful representations even for out-of-vocabulary or noisy spelling queries.

Skip-gram model’s ability to capture semantic information enhances the robustness and relevance of embeddings, further aiding in handling variations in vocabulary.

B. Neural Network Classifier

We create a custom neural network text classifier inspired from EXAM.

The model utilises an interaction mechanism to incorporate word-level matching signals into the text classification task

User query “black drop shoulder” goes through the following transformations:

Generate word embedding using fasttext.
Bidirectional Gated Recurrent Unit sequence to sequence layer to create contextual word embedding
Calculating interaction of contextual word embedding with class level embeddings.
Aggregating interaction of each contextual word embedding across classes
Transforming interaction space vector to classification vector.

The interaction used here is cosine for its simplicity and efficiency. We use sigmoid as output activation for its utility in multi-label scenarios. The activation doesn’t squash output probabilities for classes if more than 1 suitable output class occurs.

Interaction mechanism as described in the src: EXAM.

Above figure demonstrates the interaction mechanism between the learnt class representations and the text embeddings. Word level representation from Bi-GRU is interacted with the class representation vectors.

Radial decision boundaries / clusters (labelled in magenta) in product category representations learnt in training demonstrating semantic coherence amongst categories

Conclusions

The model is integrated in search query understanding and an A/B experiment is ran against existing behaviour. Results show decrease in query abandonment, improvement in click through rate and revenue per user.

Following head queries show improvements in production.

Approaching general language understanding on traffic

Improved discoverability of athletic wear, previous experience showed sneakers and casual shoes

Showing only protein powder for “whey” and not makeup and beauty products

There is room for exploration across using modern language models which leverage massive knowledge from internet scale pre-training and the groundbreaking transformer architectures. Proof of concepts utilising fine-tuning on pre-trained Roberta and DeBERTa-v3 show tremendous promise.