Praveen S

Nov 25, 2019

4 min read

E-Commerce Image Attribute Extraction using Machine Learning

The average visit to an e-commerce website in the US lasts 5 minutes¹, yet the conversion rate for visitors is just 4.4%². What can retailers do to boost their conversion rates?

One of the main reasons that users drop off is that they fail to find the products that they’re looking for. Surprisingly, when this happens, it’s often because consumers fail to find products that were actually on the catalog.

There are a host of approaches to building out better search, but in this article, we’re going to focus on improving the quality of “structured data” to enable:

Specifically, we’ll look at the extraction of structured data from product images (we’ve covered text attribute extraction in previous discussions). Here, “structured data” refers to key-value pairs of information about the product where the keys, and possibly even values, conform to a standard taxonomy. Here are some examples of why structured attributes matter for search.

Searching for floral dress, returns non-floral dresses in the top results, even though the catalog contains multiple floral dresses.

Selecting the filter Sleeve Length = Short Sleeve shows long sleeve and sleeveless dresses on the very first page.

A pair of shorts turns up as a top result for Category — Pants, even though there’s a separate category for Shorts. This happens because the product has incorrect Apparel Type.

The Remedy

To remedy this, we’ve built machine learning algorithms that can extract tags/labels from product images in an automated fashion. This usually results in a high quality of features, since images are rarely prone to human data entry error. Features that we commonly tackle include apparel type, dress style, color, sleeve type, sleeve length, pattern, neck type, neck features, dress length, closure type and gender.

How We Do It

Semantics3 houses hundreds of millions of products (Universal Product Catalog) with normalized attributes spread across thousands of sites. Our taxonomist selects the top attributes and sets a clear, strict definition of what constitutes an attribute, what its permissible values are, and what each of these values represent.

This is harder than it may seem, because many sites use the definition of attributes leniently. For instance, a shirt may be classified under tops & tees, or a cardigan may be classified as sweatshirt, depending on the level of strictness of the taxonomist.

Image Attribute Extraction Training Pipeline

Once a high quality image dataset has been created, we train a CNN based model to predict the attributes. Since, the dataset is created from thousands of sites, the model is able to generalize and learn nuances across images from different sites. After this, our human annotators look into the results of the model predictions and re-label the images based on the definition provided by the taxonomist. The model is re-trained and the process is repeated until the model attains production-ready standards.

We believe that our approach to solving this problem is unique, since we begin with the Universal Product Catalog as a substrate — which helps us generalize our models really well — and layer a nuanced application of taxonomy beliefs to remove subjectivity from the process — which helps build a standardize search experience.

This article was originally published on the Semantics3 Blog