Over the past 7 years, we’ve built an extensive Universal Product Catalog, by curating and understanding public data from across the public e-commerce web. This includes information about 100s of millions of products, ~1000 standardized attribute types, billions of attribute values and tens of billions of pricing and ranking signals.
Now, as part of our latest research initiative, we’ve built an Ecommerce Knowledge Graph to harness the value of the relationships between the entities in our datasets. At the core of this graph is the set of relationships between the structured attributes that describe products in the catalog; the graph is also layered with the billions of relationships between products themselves through characteristics like shoppability, browsability and compatibility.
What kind of opportunities for value creation does this open up? Here are some interesting ways in which we’ve been using the graph:
By analyzing what kinds of products are most popularly purchased or viewed together, at scale, we can build an understanding sense of consumers’ design preferences. This can in-turn be used to provide suggestions of what kind of items are most compatible with each other.
For example, if a consumer is looking for suggestions for pants to go with his or her bright blue jacket, the graph may surface espresso pants that have previously been purchased in tandem with blue jackets. Combined with Semantics3’s solutions of extracting attributes from images, this could be used to help retailers and advertisers build and seed suggestion engines.
Supply Chain Insights
Imported Nike products are most likely to be manufactured in either China or Vietnam. But how does this split up when you look at just synthetic shoes. Or shoes that are retailed at a particular price range? Answering such questions using the Ecommerce Graph is very straightforward — all you have to do is ask the right questions that can help solve your, or your customers’ business needs.
We can build detailed profiles for brands including:
- How their pricing strategies vary depending on attributes like material, country of origin, gender or sport/occasion.
- What designers’ individual style preferences are, and how they vary across the attribute spectrum.
- Which other brands they have the most synergies with — for example, are Levi’s jeans more likely to be purchased with Nike shoes than with shoes from Adidas? And whether this lines up with the list of brands they are most likely to be marketed or co-sponsored with.
Profiles like this can help retailers overcome the cold-start problem for new brands for which they have no prior data.
Paired with our Universal Product Catalog, we envision that the Ecommerce Graph can provide additional tools for our retail and marketplace partners to navigate their catalog and content challenges.
Under the Hood
A look at the technology that we use to build such graphs:
1. Crawling Technology: An in-house crawler built ground-up, which gathers data from several million ecommerce web pages each day, both to discover new products and to refresh existing entries in the catalog.
2. Content Extraction: Supervised and (patent pending) unsupervised extraction engines parse out intricate details from webpages, including variation drop-downs, price & availability and recommended links.
3. Attribute Extraction and Normalization: Algorithmic and heuristic based engines that extract and normalize structured data from unstructured HTML and text, to generate the attributes modelled in this relationship.
4. Classification Engine: Network of classifiers that group products into meta classes, providing the foundation on which a graph can be built.
5. Graph Engine: Graph database that stores and powers the Ecommerce Graph.
In future efforts, we intend to augment the graph with information about sellers on third-party marketplaces, add in relationships between retailers themselves and track the evolution of these relationships over time by fleshing out historical datasets.
To learn more, send us an email at govind [at] semantics3 [dot] com
This article was originally published on the Semantics3 Blog