Decoding user queries in E-Commerce

Kritika Jain
Myntra Engineering
Published in
4 min readAug 20, 2020

Search at Myntra aims at understanding each and every term in the user query and return the most relevant results. It is an amalgamation of AI driven models, sophisticated tools, manually curated fashion expertise and a well structured catalogue.

Query processing pipeline follows a modular and layered design where input to the system is the user query and output is a structured query. Each query term in the input query is annotated with the most relevant tags.

Query understanding Pipeline

Query Processing Components:

1. Sanity

This component identifies the misspelled terms in the query and finds the closest match in the sanity corpus by considering qwerty edit distance, phoneme edit distance and word break algorithm. It uses spellcheck component in Solr to generate possible candidates and Noisy Channel model for disambiguation among them.

2.Synonym Substitution

Synonym substitution performs dictionary based corrections to the query, mapping colloquial terms, geographical and language influences, brand name acronyms to corresponding synonymous word in Myntra catalogue. It uses SynonymGraphFilterFactory in Solr.

3. Query Annotation

Query annotation is identification of keywords in user query like article types, brands, colors, product category and article type specific attributes like collar type for shirts or fastening type in shoes. All the terms in the query are annotated with predefined tags. For query annotation, FST based SolrTextTagger is used to generate all possible annotations for the words and disambiguation is performed to pick the most relevant annotation by applying heuristics and using product counts in catalogue as a measure for relevance. If no product category aka Article Type is tagged by SolrTextTagger a classification model trained on click stream data is used to predict the article type user is looking for. Post disambiguation, if there are other possible annotations for any term in the query, it is presented as alternate suggestion for the user. Output of this component is and unambiguous sequence of tags and un-tagged words (if exists).

4.Graceful Degradation

Graceful degradation aims at rectifying any tagging errors and handling the un-tagged terms. A query understanding established in query annotation is degraded if there are no matching results found in the catalogue. It is a step-by-step process where the next step is executed only if the annotated query in previous step has no matching results. In each step, we either get rid of any of the annotations or the un-tagged words, reducing the precision to achieve higher recall.

5.Query Substitution

Query substitution is transformation of the user query by replacing/removing one of the annotations in the annotated query with an attribute which is most similar to the attribute being replaced. Query substitution is powered by a sophisticated graph representing all fashion entities in Myntra, interconnected with weighted relationships representing the affinities between the corresponding entities. This graph is created from click-stream data generated from user browsing on Myntra portal.

6.Fallback Strategy

Fallback strategy is applied in-case no results are found until this stage. All the annotations are dropped and a free text match is performed across all the attributes.

The complete search pipeline depends largely on Solr, be it for spell-check, tagging, synonym substitution or checking product counts post degradation and substitution steps. Hence query understanding, i.e mapping of user query to structured query understanding, is cached in a refresh ahead cache.

The query understanding pipeline resulted in 80% of recall without impacting precision and scalability.

We’ll publish deeper insights in sub sections of query understanding, in future posts. Look out for this space!

#myntralife #myntraengineering

--

--