Scaling Offer Categorization with Hierarchical Weak Labels
How we improved offer categorization with weak supervision
Sometimes projects that involve categorization are fixed and do not allow easy changes over time and are rigid about their final classification. A real problem lies in product offering titles and how various categories that each of those products can belong to. We are addressing a text categorization problem, but it can be adapted to other types of data.
Within some companies, the categorization process is important because its result implies changes in each subsequent activity in the data processing. There is a dependency on category information to be able to predict other product characteristics.
As time goes by, new product categories are starting to become part of the company’s interests, and the number of classes tends to increase, and there is a need to train a new model again.
So let’s create a scalable and flexible solution!
Problems summarized are a fixed and limited set of categories and the difficulty of including new categories for this task. A strategy that somehow allows overcoming these two problems was thought of, which came to the proposal of using three stages supported by machine learning, mapping, and filters/matching of titles.
Categorization Model
First, a model is trained with several categories that meet the greatest possible diversity within the scope worked, the construction of a dataset that can guarantee this feature is essential. The dataset for this project was created using a technique known as Weak Supervision, and the general idea is to build rules with support from problem experts that determine the category of unlabeled examples.
A rule applied to our problem was the selection of the most frequent word in the titles. To simplify the issue of several inflections of words that are similar, the root of the words was used, for example, car, cars, cart, all transformed into a single lemma car. Cuts were made in the low frequencies of certain categories and limiting the number of examples of frequent categories, avoiding dataset imbalance. Unimportant words were also removed, some chosen manually, such as product brands, and others in pre-processing, such as stopwords removal.
So we train a model, making a refinement in search of better parameters. With the model ready to use, in production, we take its output, not as a class that determines at once the category for that offer, but rather its k possible categories and their respective confidences. We can use any algorithm as long as it returns confidence to its output classes. This allows at some point we can look at more than one exit tag and make new categories.
Mapping
The second step is the mechanism that allows for broader management of the categories that are important now and those that may be in the future. The k categories and their respective trusts are analyzed in this step. We define a confidence threshold to determine the category according to the tag given by the model. A mapping between the model’s categories and the second set of categories created by those interested in the problem obtains a category for the next step or is marked with an unknown if no confidence is good enough.
Matching / Filter
The third step is aimed at finding some keyword in the title that determines the category or excluding it from any category. It works as a finer filter to remove examples that belong to the category but are not of interest or improve points where the model cannot capture the correct category because it is in another category according to the model. If looking for keywords within the title and not finding a match, the alternative is to look for keywords in the breadcrumb, which is a map for a location or product path within the sections of the site where the offer is hosted.
In the last two steps of mapping and matching/filtering, we have expert support to check tags and keywords. Keyword discovery can also be done with machine learning support again like a Decision Tree, and/or Feature Selector.
Putting all the pieces together, at the end of the process we have a diagram with another example of an offer along with these steps. But now, observe that the miss classification is corrected by matching.
That’s it, we now have a flexible and scalable implementation, lets you use machine learning to quickly adapt to new categories, and fine-tune the categorization process. Any machine learning model that generates confidence values can be used.
References
Weak Labels: https://dl.acm.org/doi/abs/10.1145/3292500.3330773
Thank all Birdie team for all help!