Published in

Walmart Global Tech Blog

8 min readJul 19, 2024

Using Predictive and Gen AI to Improve Product Categorization at Walmart

Motivation

With over 400 million SKUs, Walmart.com must streamline the online shopping process to enhance customer satisfaction. In our physical stores, customers can navigate through a variety of departments or aisles. Our digital platforms (website and mobile apps) also closely mirror the layout of a traditional Walmart store, offering customers a familiar and convenient shopping experience.

To optimally use the limited display space and enable customers to discover the most pertinent and appealing products within each department, we developed Ghotok [1], a cutting-edge AI technique which can effectively analyze and understand the relationships between different products and categories. The ultimate goal of this project is to save customers’ time by making their online shopping experience more efficient and fun.

Problem Statement

Picture yourself strolling into a Walmart supercenter — a physical space, meticulously designed with rows of organized shelves, each packed with items in their respective categories. From beauty products clustered in one corner to electronics displayed in another, everything is arranged for your convenience. Now, translate this experience to Walmart’s digital storefront. It’s like walking through an invisible, yet perfectly arranged warehouse of goods, with each product neatly tucked into its own niche category to make your online shopping journey smooth and effortless. Imagine this categorization as an interactive tree chart. Each main category branches out into subcategories, which in turn splits into several other subcategories, and so on. The deeper you navigate into this Walmart category tree, the more specific your search results become, saving you precious time and bringing you closer to your search goal.

Consider the example of the following category tree:

→ Toys

— → Outdoor Toys

— — → Pool Toys

— — → Swing Sets

—→ Dolls & Dollhouses

— — → Fashion Dolls

Another hierarchy in Walmart’s product catalog is the ‘Product Type’. This classification aids in comprehending the intended use of each item and is not displayed directly on the website. Walmart’s catalog boasts a wide range of products, each having unique characteristics and thus requiring distinctive descriptions. For instance, a screwdriver may be useful to both electronics engineers and dentists; however, the nature of products used in the medical and electronics fields is significantly different. Therefore, it is imperative to assign the correct product type to assist customers in finding the appropriate item. For instance, a screwdriver used by a dentist should be classified under product type “Oral Care Accessories”, while one used in electronics should fall under “Screwdriver Tool”.

Given the large number of categories and many SKUs, sometimes items are mistakenly categorized. To avoid showing less pertinent items, we do the following:

1. Fetch items based on the category they fall into.

2. In parallel, identify the relevant product types for the category.

3. Filter the items fetched on step 1 based on the product types in step 2.

To make certain that the most relevant product types are identified in step 2 for each category or subcategory, Walmart employs a state-of-the-art AI technique named Ghotok. The mission of Ghotok is to group the products as uniformly as possible, enabling one or more product types to be displayed in each category. For instance, in ‘Pool Toys’ subcategory, product types could range from ‘Water Slides’, ‘Pool Floats’, ‘Water Blasters’, and more. Ghotok harnesses this relationship between a category and a product type.

Ghotok’s Approach

Ghotok’s objective is to consider domain-specific contextual information to understand the many-to-many relationships between two different types of product hierarchies (that is, Category and Product Type) pairs. To achieve this, Ghotok incorporates advances in both Predictive [2] and Generative [3] AI techniques to find the most relevant product types for each category. Instead of choosing one single model for both Predictive and Generative AI, we use an ensemble of models. Ensemble models [7] are a machine learning approach that combines multiple ML models to make predictions. The goal of ensemble learning is to combine the outputs of diverse ML models to create a more precise prediction. One benefit of this approach is that it dispenses with the requirement for customer engagement data (which is noisy as sometimes customers might click on items by mistake or out of curiosity) by leveraging a limited amount of human-labeled data. This makes the model effective for both frequently and rarely visited parts of our product hierarchies.

Here are the steps we follow:

First, we train predictive AI models based on domain-specific features. For training, we choose the best hyperparameter based on a human-labeled set. For this, we utilize multiple metrics such as precision, recall, f1, true positive rate (TPR) and false positive rates (FPR). We do not need to train Generative AI models on the domain-specific features as they understand our context when given chain-of-thought prompts [8,9]. Here is the difference between predictive and generative AI models with examples:

Differences between predictive and generative AI models

2. As ML inference using Generative AI technology is costly, we do not run Generative AI models for all the millions of candidate pairs. For each Predictive AI model and for each <Category, ProductType> pair, we first learn confidence thresholds by fixing a certain false positive rate (FPR) and filtering out candidate pairs that do not satisfy this threshold. This reduces the total number of candidate pairs to select from millions to thousands.

3. We then use Generative AI methodologies and their learned relevance thresholds to filter out more irrelevant pairs. On a validation set, this ensemble of Predictive and Generative AI models showed the best performance.

Integration of Ghotok to our Backend System

Our product catalog involves several thousand categories. Each of these categories is connected to, or can correlate with, hundreds of different ProductType nodes. When you consider the sheer number of categories and the hundreds of product types within each, the resulting data set can lead to several million rows of offline data. Therefore, our backend system is handling an incredibly large and complex set of information. This not only implies a broad range of products but also the extensive data handling and management capabilities that are required to efficiently organize, retrieve, and analyze such a huge amount of data. To meet the typical service level agreement requirements that usually range in a few milliseconds, an effective mechanism is needed to process millions of entries from this offline data for filtering in response to user requests.

Fig: Schematic Diagram of Ghotok’s integration to the backend system

To minimize the time taken to access entries from our primary storage, we use a two-tier caching system with LRU caching mechanism. We store a fixed number of mappings of Category to Set<ProductType> in a two-level cache. The first level (L1) cache is small enough to provide access time of one or two cycles. The second level (L2) cache is larger and therefore a bit slower than the L1 cache. The processor initially searches for data in the L1 cache. If it is not there, the processor checks the L2 cache. If the cache does not contain the data, we query the primary storage.

Lessons Learned

We highlight the key lessons learned from our experience in improving the performance of our generative model, managing hallucination in AI, and the importance of exception handling in deployment.

Chain-of-thought [4, 5] reasoning is a complex cognitive process that involves linking ideas in a logical sequence to form a coherent argument or explanation. In the context of AI, this means the model traces a logical path from the initial prompt to the generated output, ensuring that the output is both contextually relevant and logically consistent. This method proved instrumental in enhancing the adherence of our generative model to system prompts, thereby improving the quality of the output.
Symbol tuning [6], on the other hand, is a technique that fine-tunes the symbols or parameters in a model to improve its performance. In our case, we made use of symbol tuning to adjust the importance given by the model to different parts of the input. Utilizing the entire path representation (root-to-the-node) as a string, rather than just context node names, proved essential. Specifically, we found that instructing GenAI to give higher importance to the leaf node during relevance assessment led to a marked improvement in the quality of the generated reasoning.
Hallucination in GenAI does not pose a significant issue for us, as we selectively utilized GenAI’s (billion parameter models) advanced semantic comprehension of various terminologies, encoded in its knowledge base, to eliminate false positives from the predictive AI (million parameter models) methodologies.
Although the use of predictive and generative AI has successfully lowered the False Positive Rate (FPR) to a satisfactory level, we still could have faced some challenges after deploying the system in production. To handle any edge case, we developed an exception handling tool, powered by both machine learning and human intervention, which facilitated swift and seamless resolution of these issues.

Conclusion

Much like physical stores where items are smartly arranged in rows and shelves, Walmart’s digital storefront is designed with a clear and efficient categorization system that ensures a smooth and effortless online shopping experience for customers. Our innovative AI tool Ghotok uniquely combines the strengths of both predictive and generative AI models to create a framework that can map the relationship between categories and product types. Through this sophisticated categorization system, whether you are shopping for beauty products, electronics, or toys, Walmart’s digital platform makes your search quick, easy, and precise, bringing you a step closer to your desired product.

References

[1] A ghotok is a person who acts as a matchmaker in the Bengali community. https://medium.com/@taufiquehossain/the-role-of-a-ghotok-in-marriage-in-bangladesh-bf359b43d58d

[2] “What is Predictive AI?” Cloudflare. https://www.cloudflare.com/learning/ai/what-is-predictive-ai/. [Accessed 7 May 2024].

[3] “What is Generative AI?” Cloudflare. https://www.cloudflare.com/learning/ai/what-is-generative-ai/. [Accessed 7 May 2024].

[4] Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, Zhou D. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems. 2022 Dec 6;35:24824–37.

[5] Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. Advances in neural information processing systems. 2022 Dec 6;35:22199–213.

[6] Wei J, Hou L, Lampinen A, Chen X, Huang D, Tay Y, Chen X, Lu Y, Zhou D, Ma T, Le QV. Symbol tuning improves in-context learning in language models. arXiv preprint arXiv:2305.08298. 2023 May 15.

[7] John Elder, “The Apparent Paradox of Complexity in Ensemble Modeling” in Handbook of Statistical Analysis and Data Mining Applications (Second Edition), Robert Nisbet, Gary Miner, Ken Yale, Eds. London: Academic Press, 2018.