We achieved our highest accuracies above 90% for stores in the fashion or jewelry industries, whereas our lowest accuracies of 70–80% were for stores in the grocery and home supplies industries, which mainly stems from the fact that the latter industries have a significantly larger category set and a lot more diversity in their data.
Like the class probabilities, category similarities are also scaled to the range between 0 and 1, and we count every value above 0.6 as “similar enough” to match a model category to a store-specific category. To account for the variance in the similarities between matches, we multiply the probabilities of our class predictions with these similarity scores to quantify our confidence in the final category predictions.
We can get general category predictions now, but we still need a mechanism to match these categories to store-specific categories, such that we do not bombard our customers (i.e. different online shops that use our API) with category recommendations that they have not defined. If our model says this product belongs to the category “bracelets”, we need to know to which categories in the store that made the request this fits, since there can be quite some variance. For this task, we used the library gensim to train a Word2Vec model on a large corpus of Google News articles, which is commonly used to estimate word similarity. Since it can take a while to compute the similarity between all model categories and all store categories with this model, we precompute these similarities and store them in a database, which is updated every night.