Govind Chandrasekhar

Jan 28, 2020

3 min read

Introducing Attribute Extraction from User-Generated Content

Attribute Extraction from ecommerce data — the generation of structured fields from unstructured text — is a popular product offering of ours. Our customers use it to improve the quality of their search catalogs, and thereby, their search relevance, faceted search and ad targeting.

Thus far, the scope of this product offering has extended to catalog data, primarily product titles, description and specifications. Now, we’re extending this capability to process user-generated content, including customer questions & answers.

By distilling factual information from customer content, these algorithms can help boost the number and relevancy of structured attributes on popular ecommerce product pages. For brands, sellers and other content creators, this means that the more customers interact with your product listing, the better the quality of the listing gets.

What’s more, since these attributes come from data that customers have volunteered themselves, the importance of these attributes in influencing future purchase decisions is likely to be high.

Consider the example of a Fogg Analog Watch on which this algorithm was applied. The following Q&A entry was detected on the page and run through the Semantics3 Attribute Engine:

This particular site enables faceted search for water resistant watches. Prior to the addition of IS_WATER_RESISTANT to the attribute list, this particular product did not turn up during faceted searches, and hence had limited visibility for relevant searches. Moreover, since the catalog listing didn’t carry explicit information about the fact that the watch is water resistant, some potential customers, unsure about the product’s characteristics, may have decided against making the purchase.

So how does this algorithm work? It relies heavily on our core TAE (Text Attribute Extraction) engine, and is layered with decision engines that parse the intent and meaning behind the input text. At a high-level, it involves three distinct steps:

  1. Intent inference: Understanding what the user is talking about
  2. TAE: Text Attribute Extraction of meaningful values from the input
  3. Conflation: Conflation of intent and attribute values to understand the semantics of what the user is looking to communicate.

Here are some examples of the algorithm in action:

Interested in using this to boost you product listings? Book a call with us, or drop us an email at

This article was originally published on the Semantics3 Blog

A look at how data is shaping the future of e-commerce, gleaned from our stockpile of E-commerce product, pricing and customer metadata. Also see