Video: Using Product Images to Achieve Over 90% Accuracy in Matching E-Commerce Products

Kumar Shubham
DataWeave
Published in
2 min readAug 9, 2017

Matching images is hard!

Images, intrinsically, are complex forms of information, with varying backgrounds, orientations, and noise. Developing a reliable system that achieves human-like accuracy in identifying, interpreting, and comparing images, without investing in expensive resources, is no mean task.

For DataWeave, however, the ability to accurately match images is fundamental to the value we provide to retailers and consumer brands.

Why Match Images?

Our customers rely on us for timely and actionable insights on their competitors’ pricing, assortment, promotions, etc. compared to their own. To enable this, we need to identify and match products across multiple websites, at very large scale.

One might hope to easily match products using just the product titles and descriptions on websites. However, therein lies the rub. Text-based fields are typically unstructured, and lack consistency or standardization across websites (especially for fashion products). In the following example, the same Adidas jacket is listed as “Tiro Warm-Up Jacket, Big Boys (8–20)” on Macy’s and “Youth Soccer Tiro 15 Training Jacket” on Amazon.

Hence, instead of using text-based information, we considered using deep-learning techniques to match the images of products listed on e-commerce websites. This, though, requires massive GPU resources and training data fed into the deep-learning model — an expensive proposition.

The solution we arrived upon, was to complement our image-matching system with the text-based information available in product titles and descriptions. Analyzing this combination of both text- and image-based information enabled us to efficiently match products at greater than 90% accuracy.

How We Did It

A couple of weeks ago, I gave a talk at Fifth Elephant, one of India’s renowned data science conferences. In the talk, I demonstrated DataWeave’s innovation of augmenting the NLP capabilities of Solr (a popular text search engine) with deep-learning features to match images with high accuracy.

Check out the video of the presentation for a detailed account of the system we built:

To read the entire article on www.dataweave.com/blogs click here.

--

--

Kumar Shubham
DataWeave

Working with deep-learning algorithms to derive meaning from massive data-sets.