Product categorization API

SeniorQuant
Product categorization
9 min readJan 29, 2022

--

In this article, we will introduce you to our product categorization tool, which allows you to easily and highly accurately classify products, and other eCommerce texts into 5574 eCommerce categories (with 6 Tiers of depth). It supports Google Shopping Taxonomy, as well as other taxonomies.

You can try it out for free at:

https://www.productcategorization.com

Our state of the art classifier is used with great success by a large number of companies, ranging from Unicorns, multinationals, online stores, eCommerce analytics companies, Saas companies and individuals.

Here is example of classification of “fitness equipment — stair climber”:

Classification of “Wakeboard for advanced riders”:

AI classifier is extremely accurate, here is a recent feedback from its user: “I am surprised how accurate your product is. I am really shocked. I love it.” David

One of its cool features is an ability to provide explainability of results by machine learning classifiers — by identifying the words with highest contribution to resulting categorization.

In the screenshot below, you can find an example of product description that has been classified by classifier as Laundry Appliances.

In addition to categorization, classifier also provided explainability by colouring words that most contributed to this categorization: “washer”, “dryer”, “clothes”, “wash”, “combo”, “capacity”.

Another example of AI explainability by the classifier, but this time for categorization of website www.cnn.com:

“international”, “news”, “politics”, “world”, “health”, “times” are words that most contributed to cnn.com being classified as “News and Politics”.

Why the need for product categorization?

When we enter a store, looking for a new item to purchase, it helps when we see various segments of store denoted with appropriate labels e.g. Electronics, Household Appliances, etc.

So we expect that the online stores and shops also provide web visitors with the same kind of categorization. This improves the search, discoverability and filtering of the online shop websites and ultimately leads to better conversion.

Product categorization is the task of classifying products as belonging to one or more categories from a given taxonomy.

Product classification and website categorization taxonomies

If we want to do categorization of products or categorize websites, there is not a single way of doing it. A lot of ecommerce companies have their own set of rules for categorizing products. These rules or definitions are also known as taxonomies.

When it comes to websites classification a well known standard is that of IAB. It is especially suitable for categorizations in marketing area. E.g. an advertiser generally wants to advertise only on websites of publishers that are from specific categories. Thus an industrial company would not want to place an ad on website of fashion magazine.

In regard to IAB note that there have been several revisions of IAB classifications over the last decade and it is better to use the latest revision from September 2021. Revisions are necessary because of new verticals/categories constantly arising and becoming popular.

Google and Facebok product categorization taxonomies

If you are more interested in Ecommerce product classification service domain, two of the best taxonomies are those from Google and Facebook.

Google product taxonomy has several Tiers and you can explore it in more detail at https://www.google.com/basepages/producttype/taxonomy.en-US.txt.

A selection of categories from Google Taxonomy:

Of particular interest is building product categorization models for Tier 3 level, because it is quite detailed, with over 1000+ categories and including many micro niches, such as e.g. “Bird Cage Food & Water Dishes”, “Baby & Toddler Outerwear”, “Snow Pants & Suits”, etc.

Why is the depth and large number of categories important?

Because as we will later discuss, the more number of categories you have, the better is the discoverability/filtering on your website and the more benefit you get from increased visits from search engines.

Another set of product taxonomy examples is the one from Facebook: https://developers.facebook.com/docs/marketing-api/catalog/guides/product-categories/

The specific classification such as:

Apparel & Accessories > Clothing > Underwear & Socks > Shapewear

is also known as Taxonomy Path.

The machine learning models can have as its objective either to predict a class in given Tier level or the complete product taxonomy path.

The latter was e.g. objective of the Rakuten 2018 ecommerce challenge: https://sigir-ecom.github.io/ecom18DCPapers/ecom18DC_paper_13.pdf

In our company we have built machine learning models of both kinds — predicting taxonomy paths and predicting single Tier level categories.

What you want mostly depends on your specific use case.

Text and product classification models

The product categorization for online stores is in practice usually performed in automatic manner, using machine learning models for this purpose, from the group of text classification models.

There are many ML models available for text classification. We are listing here a list of possible ML models, from simple to more complex ones (by no means is the list exhaustive):

  • Naive Bayes classifier
  • K-Nearest Neighbors
  • Support Vector Machines (SVM)
  • Logistic regression
  • Decision Trees
  • Random Forests
  • Deep Neural Networks
  • Recurrent Neural Networks (RNN)
  • Convolutional Neural Network (CNN)
  • Ensembles of neural nets

At our company, we have used all of the aforementioned ones in the past, some as baselines for comparison, others for testing and deployed in production.

The particular ML model that is best suited often depends on the problem. E.g. SVM works well for smaller data sets, but because its complexity / training time rapidly increases with the size of data sets used, it is not best for problems with very large training data sets.

Vectorization of texts for product classifier

An important of text classification task is also the pre-processing and conversion of texts in numerical form, with which product classifier can work.

Pre-processing often includes but is not limited to following steps:

  • removal of stop words
  • lowercasing
  • spelling corrections
  • stemming
  • lemmatization

Once the text is pre-processed, there are several methods to convert them into numerical format.

Often used is TF-IDF and it can work well in many text classification problems.

Where semantic meaning plays an important role, it is useful to explore embedding techniques, including Word2Vec, GloVe, ELMo and other methods.

A useful library for conversion to vector format is the FastText library, especially when dealing with non-english languages, it supports trained models for over 150 languages: https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md.

FastText is a bit different from other word level embeddings like Word2Vec in that it operates on character level, using character n-grams.

E.g. if your word is “that”, then n-grams would be:

  • < t
  • th
  • ha
  • at
  • t>

FastText can thus also be useful when dealing with rare words. We also used it with great success in developing text classification platforms on social media data, where words are more often misspelled, shortened, modified.

What are the benefits of using product categorization or product tagging?

Two of the key benefits of using automatic product categorization on your online shop is that your customers can more easily find relevant products, especially if you are selling products in several different verticals.

Another very important benefit is that its implementation:

  • allow you to generate more webpages, indexed and available on search engines (with corresponding more opportunities for users to find your website via search)
  • leads to higher rankings due to more relevant keywords on your webpages

Both mean more free visits from organic rankings of your webpages on search engines.

How can product categorization and product tagging improve your rankings on search engines?

If you want a higher ranking of your given webpage for given keyword, e.g. “bead necklace” then you should have words on your webpage that are semantically and topically related to keyword “bead necklace”.

Here is where product categorization and tagging can help, you can thus add Tier 2, 3 and 4 level categories to each of your product webpages to make it more relevant for google.com and other search engines, e.g. for “bead necklace” you could add: Jewellery, Necklaces / Jewellery Sets.

You can also go a step further and add not only categories but also highly relevant tags to each of your product pages.

For this purpose, we have built a ML-based product tagging solution that automatically produces highly relevant tags from product names.

You can use it for free at https://www.producttagging.io/demo_dashboard/

For “bead necklace” it produces the following ideas for tags:

necklace, jewelry, beads, bracelet, handmade, beaded, necklaces, jewellery, pendant, necklace set, handmade jewelry, chain, fashion jewelry, long necklace, handcrafted jewelry

What is product category for given item?

Here is another set of tags ideas for phrase “sewing pattern”, produced with our product tagging tool (“value” denotes the relevancy):

{
"language": "en",
"classification": [
{
"category": "pattern",
"value": 0.42340898513793945
},
{
"category": "sewing",
"value": 0.18840937316417694
},
{
"category": "sew",
"value": 0.10562954097986221
},
{
"category": "cotton",
"value": 0.07833655178546906
},
{
"category": "quilting",
"value": 0.05662618204951286
},
{
"category": "quilt",
"value": 0.05658867955207825
},
{
"category": "vintage",
"value": 0.054514266550540924
},
{
"category": "embroidery",
"value": 0.05450379103422165
},
{
"category": "handmade",
"value": 0.052232421934604645
},
{
"category": "crochet",
"value": 0.04206467792391777
},
{
"category": "applique",
"value": 0.03805079311132431
},
{
"category": "weaving",
"value": 0.03781686723232269
},
{
"category": "wool",
"value": 0.03761012479662895
},
{
"category": "knitting",
"value": 0.03648590296506882
},
{
"category": "fabric",
"value": 0.0340331606566906
},
{
"category": "cross stitch",
"value": 0.031439878046512604
},
{
"category": "thread",
"value": 0.027697991579771042
},
{
"category": "patterns",
"value": 0.01848462037742138
},
{
"category": "rug",
"value": 0.0134653989225626
},
{
"category": "yarn",
"value": 0.012413100339472294
},
{
"category": "cotton fabric",
"value": 0.012060941196978092
},
{
"category": "textile",
"value": 0.012012960389256477
},
{
"category": "textiles",
"value": 0.010599330067634583
},
{
"category": "stitch",
"value": 0.009857836179435253
},
{
"category": "accessories",
"value": 0.00966225378215313
},
{
"category": "stitched",
"value": 0.009236618876457214
},
{
"category": "batik",
"value": 0.009234720841050148
},
{
"category": "winter",
"value": 0.009045564569532871
},
{
"category": "upholstery",
"value": 0.009000061079859734
},
{
"category": "download",
"value": 0.008843736723065376
},
{
"category": "patterned",
"value": 0.008658943697810173
},
{
"category": "patchwork",
"value": 0.00858057290315628
},
{
"category": "neutrals",
"value": 0.008539760485291481
},
{
"category": "sofa",
"value": 0.007974304258823395
},
{
"category": "handwoven",
"value": 0.007709966041147709
},
{
"category": "antique",
"value": 0.0076023912988603115
},
{
"category": "machine washable",
"value": 0.00661444291472435
},
{
"category": "fall",
"value": 0.006400591693818569
},
{
"category": "summer",
"value": 0.006314214318990707
},
{
"category": "girls",
"value": 0.006126525811851025
},
{
"category": "shawl",
"value": 0.006045812275260687
},
{
"category": "baby",
"value": 0.005956804845482111
},
{
"category": "girl",
"value": 0.005812007002532482
},
{
"category": "colours",
"value": 0.005611352622509003
},
{
"category": "india",
"value": 0.005135116167366505
},
{
"category": "geometric",
"value": 0.005095324013382196
}
]
}

Free product categorization and product tagging classifiers

Use of product categorization and tagging is by no means limited to online stores providers, but can also be used for many other use cases.

You can test classifiers for product categorization (using modified Google Product Taxonomy) and website classification (using IAB Taxonomy) freely at our website https://www.productcategorization.com/.

Product tagging solution can be found at https://www.producttagging.io/demo_dashboard/

If you are more interested in solution that categorises websites, then check out a free website categorization tool at: https://www.websitecategorizationapi.com

Using this web classification service we have categorized millions of domains in terms of 440+ possible categories, based on IAB taxonomy.

We offer this in form of offline url database for web content filtering, which is ideal if your app/service needs low latency for checking website categories for millions of domains.

In our next article we will go into more detail on how to build a product categorization and website classification model with over 90% accuracy.

--

--

SeniorQuant
Product categorization

Ph.D. in Theoretical Physics, Senior Data Scientist