Scaling Offer Categorization with Hierarchical Weak Labels

Published in

birdie.ai

4 min readJul 15, 2021

How we improved offer categorization with weak supervision

Sometimes projects that involve categorization are fixed and do not allow easy changes over time and are rigid about their final classification. A real problem lies in product offering titles and how various categories that each of those products can belong to. We are addressing a text categorization problem, but it can be adapted to other types of data.

Within some companies, the categorization process is important because its result implies changes in each subsequent activity in the data processing. There is a dependency on category information to be able to predict other product characteristics.

As time goes by, new product categories are starting to become part of the company’s interests, and the number of classes tends to increase, and there is a need to train a new model again.

So let’s create a scalable and flexible solution!

Problems summarized are a fixed and limited set of categories and the difficulty of including new categories for this task. A strategy that somehow allows overcoming these two problems was thought of, which came to the proposal of using three stages supported by machine learning, mapping, and filters/matching of titles.

Categorization Model

First, a model is trained with several categories that meet the greatest possible diversity within the scope worked, the construction of a dataset that can guarantee this feature is essential. The dataset for this project was created using a technique known as Weak Supervision, and the general idea is to build rules with support from problem experts that determine the category of unlabeled examples.

A rule applied to our problem was the selection of the most frequent word in the titles. To simplify the issue of several inflections of words that are similar, the root of the words was used, for example, car, cars, cart, all transformed into a single lemma car. Cuts were made in the low frequencies of certain categories and limiting the number of examples of frequent categories, avoiding dataset imbalance. Unimportant words were also removed, some chosen manually, such as product brands, and others in pre-processing, such as stopwords removal.

So we train a model, making a refinement in search of better parameters. With the model ready to use, in production, we take its output, not as a class that determines at once the category for that offer, but rather its k possible categories and their respective confidences. We can use any algorithm as long as it returns confidence to its output classes. This allows at some point we can look at more than one exit tag and make new categories.

Sample with the input for model and your output.

Mapping

The second step is the mechanism that allows for broader management of the categories that are important now and those that may be in the future. The k categories and their respective trusts are analyzed in this step. We define a confidence threshold to determine the category according to the tag given by the model. A mapping between the model’s categories and the second set of categories created by those interested in the problem obtains a category for the next step or is marked with an unknown if no confidence is good enough.

Matching / Filter

The third step is aimed at finding some keyword in the title that determines the category or excluding it from any category. It works as a finer filter to remove examples that belong to the category but are not of interest or improve points where the model cannot capture the correct category because it is in another category according to the model. If looking for keywords within the title and not finding a match, the alternative is to look for keywords in the breadcrumb, which is a map for a location or product path within the sections of the site where the offer is hosted.

The final process to categorize the initial offer. Search keywords to include or exclude the offer from the category Printers.

In the last two steps of mapping and matching/filtering, we have expert support to check tags and keywords. Keyword discovery can also be done with machine learning support again like a Decision Tree, and/or Feature Selector.

Putting all the pieces together, at the end of the process we have a diagram with another example of an offer along with these steps. But now, observe that the miss classification is corrected by matching.

The entire process for a dubious offer between two categories Tablets and Laptops, and resolved with “Laptop” keyword matching.

That’s it, we now have a flexible and scalable implementation, lets you use machine learning to quickly adapt to new categories, and fine-tune the categorization process. Any machine learning model that generates confidence values can be used.

References

Weak Labels: https://dl.acm.org/doi/abs/10.1145/3292500.3330773

Weakly Supervised Learning: Introduction and Best Practices

“How to label the unlabeled data”

datasciencemilan.medium.com

Weak Supervision, Future of Data Labeling

Overview of data labeling for AI, new paradigms, and size of the growing data labeling market.

medium.com

Thank all Birdie team for all help!