Search in fashion e-commerce

Ankul Batra
5 min readJun 15, 2017

--

Fashion is a unique e-commerce domain, which is led more by art than by science. Trends in fashion continuously evolve, subdue and resurface at a rapid pace. This nature of fashion poses many site-search challenges that can be classified under(but not limited to) following heads:

  1. User’s intent expressed by colloquial terms
  2. Store inventory lags trend evolution
  3. Ambiguity among terms

Lets take a stab at each of these with examples of user search, citing the complexity and potential solution to each of these issues.

User’s intent

Customers can use different words, keyboard types and language to express the same intent. Allow me to give you some context — What do you understand by user query like Nike Free? Add to that spelling mistakes(Nuke = nike), colloquial(free = free size ), transliterations (free = मुफ्त) terms. Moreover, intents keeps varying with new season, new brands and language evolution. Search has to be ahead of users to understand them. (Nike Free is a sub-brand of nike that is a type of running shoes.)

As a first step, search engine should try to correct the query. Get rid of spell corrections, substitute synonyms and make language corrections etc. There are several algorithms present to handle these problems separately, trick is to strike a balance by using right approach. You may end up with several ambiguous forms. System may have to span a solution along parallel lines until you are able to classify the primary intent of the search term with high confidence. I will talk about it more in the third part of challenges.

As a second step, search system should be able to completely de-construct the entities in the typed keywords. Eg. Nike = brand, Free = sub brand, Intent = sports shoes, Attribute = running type. As an extension, find associated intent as well eg. Associated brands = Adidas, Reebok, Puma etc. This helps to surface the related results (running shoes from adidas) in case the actual results are not present.

For the last step, we should substitute the results with related brands and categories in case the exact match is not found in store. However, this is a tricky step, as one can loose relevance as part of substitution. So we should never let go of primary intent and better show no result as the last resort than showing some irrelevant products.

Stores lag trends

Retail businesses are traditionally heavy in operations and inventory. Optimising the storage space, sellers on-boarding and getting the right set of products is very critical for cost optimisation. Striking the right balance requires reading the trends, completing the designs, manufacturing and efficient in-warding of products. This process, unfortunately, is not as swift as trends evolution in fashion space. There will be user searches that are for early trends and do not find a match in store.

A reactive approach to this is to keep a keen eye for zero results. We can analyse the subsequent searches by users to establish a connection between zero results and related search that gave non-zero products. Later work with category mangers to identify the right substitutes that we can offer from our inventory. Eg. “vest jackets” start trending a lot in search, this gave zero results following which many users type “sleeveless Jackets” in our store. A savvy category manger will immediately map this in catalog. This is similar to offline behaviour, where a store manager will guide you next best options available in the store. A self serve tool for these valid, editorial input can enable the store manager to handle this seamlessly.

Another effective implication is that for missing brands in store, users can find substitute brands that they associate together. It’s a win win strategy to improve both customer experience and store discovery.

A more structured approach is to reach out to internet content for fashion and lifestyle. Interpret the trending content, establish relationships of these terms to store products by reading cooccurrence of terms and images of new trends. This requires search to have an exhaustive glossary of fashion terms and map them to store listings. This is a big data problem but is a huge asset for any search company that aspires to be a go-to place for all user needs.

Ambiguity

Many search engines tokenise(separate with space) the user query before processing it and this can cause a risk of losing the context of search. Eg — “Football shoes”. Tokenisation here would leave you with two keywords “Football” + “Shoes”. Each of these have a standalone significance as a category. A simple search would lead users to a pool of all available shoes and footballs in store, which is not optimal. Another form of ambiguity can arise as a result of many grammatically correct forms of user search.

One simple approach that helps resolve this is to have lexical or span detection support. If the terms under consideration often appear together or signify a common intent then under tokenisation step you should not separate them. The pool of phrases for lexical or span detection can be generated from the product titles, users queries or from the store description for products.

Second way to resolve ambiguity is to have a new approach where we first diverge and get all the possible meanings of the query and then converge to the right meaning. Applying these steps in the current context we would see:

a. Diverge: (“Football” = category1+ “Shoes” = category2) OR (“Football” type of sports + “Shoes” = category). Get all the possible combinations.

b. Converge: Choose (“Football” type of sports + “Shoes” = category) because context is strong here and we got higher clicks from users in past.

As an additional step, it always helps to provide the feedback of conflict to the user as well. “Did you mean shoe for sport football?”. This sends a communication to the user to select the right intent, in case the machine got this wrong. The user click then goes as a feedback to the system for future reference till the confidence is high in results.

Biggest challenge is achieving all these features in real time at a scale and prioritise the right action points. Especially, when you are the largest fashion e-tailer in Indian market. Each of the aforementioned step is multiple cpu cycles and potential network calls. Striking this fine balance requires expert and dedicated engineers that support the vision —

“Create an always available content discovery mechanism basis users’ intentions”.

We are well on our way to achieve this blissful state and, luckily, Myntra has a team that understands the criticality of search and contributions required to make the desired progress!

--

--