How deep learning can transform e-commerce search

Alexander Schlegel
LF1.io
Published in
6 min readSep 26, 2019

Whether on Amazon or your local skateboard shoe store’s website, on virtually every online shop you’ll find a little ol’ text-box. These text-boxes aim to let you enter a wish so that you’re taken to a list of “matching” products — these “matches” may or may not include what you are looking for. For retailers, this box is very important. Almost 20% per cent of revenue in e-commerce comes directly from on-site search and shoppers who use search are almost twice as likely to convert.

Amazon has understood this — to make sure users use the search box, they have stretched the size of the box out to pretty much the max.

Amazon’s very long search box.

Behind the scenes scores of tech employees, AI practitioners and researchers work to improve the ranking delivered to the Amazon search results page. However at other shops, although text-search concerns all online e-commerce businesses, the results delivered are very poor; on-site product search solutions mostly require that users correctly type exactly the same words which are displayed on the product display pages. Even then, often this text-matching leads to very misleading results (take the color “baby-blue” as an example). This is in stark contrast to what Google offers us. On Google we’re used to entering long queries with typos and weird wording. Somehow we still get what we’re looking for. What’s the difference?

To put not too fine a point on it:

The technology behind current on-site product search is fundamentally flawed.

Companies such as Amazon know this and have already moved to a new paradigm based on deep learning. Now it’s possible for all companies to switch to this break through technology— implementing the new paradigm isn’t out of reach for even the smallest of companies. In this post we’ll take a look at what deep learning has to offer in on-site product search.

Product search — under the hood

In almost all cases, product search works by matching the query text with the entries in the shop’s product database — enter ‘adidas shoe’ and you’ll get products that have both ‘adidas’ and ‘shoe’ somewhere in their database entries. Often, this is accomplished using a variant of Apache’s Lucene technology, Elasticsearch.

Although hugely popular, text matching for product search comes with a range of issues which make it nigh-on impossible to implement properly. To name a few:

  • spell correction should be performed to account for users’ sloppy typing
  • synonyms need to be matched so that products can be found even if there’s no exact match
  • grammatical inflexions of words need to be taken account for — and ignored when necessary
  • rich structured textual representations of all products need to be created and maintained
  • the best ordering of products cannot be easily determined by the amount of matching text

The list goes on. To complicate things even more, solutions to all of these problems may influence each other, making product search with text matching a very complex task and the resulting systems brittle and hard to maintain, achieving poor performance as measured by recall.

What’s necessary to “hack together” classical text-search

The problem is that text matching does not solve product search directly. It operates at the surface: the text is being matched, not the meaning: text matching is the wrong approach. That’s where deep learning comes into play.

With deep learning, product search can be solved directly, in a principled manner, both simple and elegant.

In its essence, deep learning is a technology which allows computers to learn complex mappings (“deep neural networks”) from examples. These functions can be configured to transform inputs from one arbitrary domain to another. These inputs can be images, labels, text, speech etc.; the mapping can be text to images, images to text, text to speech, video to text and so on. Since the first algorithms allowing successful training of deep neural networks for recognizing items in images were published in 2012, the field has been enormously successful. Applications already permeate every aspect of our daily lives. Using deep learning, Google’s AlphaZero has learned to play better chess than any human or machine — in less than a day, just by playing against itself. Deep learning has elevated machine translation from something quite funny but not particularly useful to a serious tool. The tech. can be also be used to add snow to a summer time photo, generate scarily coherent text or transform one persons’ voice into of someone else.

How can deep learning help with product search?

The core issue with text matching is that it operates at the surface: text is being matched, not content.

In some cases this is what we want: if someone searches for a brand or an exact product name (“nike flyknit”, “bose E41200 headphones”), a search engine should return results that match on the surface level. However, when it comes to more descriptive queries such as “red dress with lace”, it really should not matter if the exact words “red”, “dress”, “with”, “lace” appear anywhere in our database, as long as the results include red dresses with lace.

A distinctive feature of deep neural networks is that they learn to extract what’s invariant about data with respect to a given task. For instance: a neural network that should recognize cats on photographs can learn what pictures with cats all have in common — the “cattiness” of an image. Check this visualization tool out for a nice illustration.

Using neural networks, we can take product data and extract information which is relevant to a buyer — the dressiness of a dress, its redness, its laciness, its style and so on — and use this to guide search. For that, the superficial form of the product data and query, the precise words being used, do not matter. Text matching matches text — nothing else. Often, this leaves out important available information. In fashion, for instance, the product pictures are often far more informative than the product description — what is the exact fit of the garment?, is it open in the back?, how far does the slit in the leg extend?, how dressy is the garment?, etc.. None of this is typically contained explicitly in the manufacturer’s meta-data.

Decisive information is contained in the product’s images not present in meta-data

Since neural networks can transform data from one domain to another, they offer a principled way to combine all available product data — images, description texts, tags, attributes, user comments — into a unified embedding which can be used to power search.

Finally, there is the problem of ordering. Even if a product text matches a query perfectly, this does not mean that the customer actually wants to buy it — maybe it’s an unattractive product. Ideally, on-site product search technology should be able to learn from users’ decisions what they want to buy. Deep neural networks learn from data; so product search based on deep learning can be incrementally improved each time a shopper makes a decision, enters a search query, clicks on a product or makes a purchase. In this way, product search can be directly improved to optimize a shop’s user experience and sales.

Does it work?

All well and good, you might say, but does it really work?.

At Aleph Search we have developed algorithms which train deep neural networks to power product search. This works very well, allowing for precise and complex queries using colorful language — just as you’re used to on Google. On top of this, features such as search by image, product recommendation and error-correcting autocomplete come naturally. Check out Duncan Blythe’s post for more details.

In the next few weeks, we’re going to go post some live demos of Aleph Search’s tech in action, so stay tuned!

Alexander Schlegel is an AI researcher and serial entrepreneur based in Berlin. He is the co-founder of Aleph Search.

--

--