Random Forests for Store Forecasting at Walmart Scale
Written by Sucheta Jawalkar, John Bowman
The SMART Forecasting team at Walmart Labs is tasked with providing demand forecasts for over 70 million store-item combinations every week! For example, just how much of every type of ginger needs to go to every Walmart store in the U.S., every week for the next 52 weeks, with the goal of improving in stocks and reducing food waste.
Our algorithm strategy was to build a suite of machine learning models and deploy them at scale to generate bespoke solutions for (oh so many!) store-item-week combinations. Random Forests would be part of this suite.
We went through the traditional model development workflow of data discovery, identifying demand drivers, feature engineering, training, cross validation and testing.
Feature Engineering and Proof of Concept
The model inputs are drawn from domain expertise and include
- Store level features like geographical location, size of the store, days since the store was opened
- Date features like month, and day of the week
- Event uplift features like Christmas, Thanksgiving and the like
- Uplift features coming from the Supplemental Nutritional Assistance Program (SNAP) for low income families
- Lag features such as the average sales over the last two weeks
We established the proof of concept for bananas which is one of Walmart’s highest velocity items and evaluated the Symmetric Mean Absolute Percent Error as our forecast accuracy metric. We found that the SNAP features are among the most important ones for bananas.
We then looked at the results for
- packaged salads
- spare ribs and back ribs
The three categories each come with unique issues for demand forecasting and tend to be good tests for any model entering the algorithm suite. Random Forests outperformed the existing Gradient Boosting Machine and state space model for apples, indicating a viable solution with large scale business impact.
We then ran the Random Forests model for all the categories in meat and produce and evaluated our metric on the test set, which is a period of 52 consecutive weeks held back from the training data.
Our models are trained on features built from seven years of sales history. The training happens mid week and the prediction happens after the Friday sales have come in. The prediction has to happen in a short time window and all forecasts have to go to replenishment over the weekend.
The Random Forests model was pickled, zipped and stored in an object store in the middle of the week. The model was then pulled out of the object store on Friday and used to predict demand.
Research Matters (bad pun alert!)
We evaluated three options to get around the adoption blocker —
The first was to see if we could reduce the number of features and throw ourselves down the rabbit hole of hyper parameter tuning. Turns out there is a super cool paper by S. Bernard. L. Heutte and S. Adam which has many valuable insights about tuning the number of features selected at each node. We took the Sobol’ Sequence route over the multidimensional hyperparameters instead of a grid search with promising early results. Hyperparameter tuning is a slow process. Hyperparameter tuning at Walmart scale is a very slow process.
The second was to evaluate alternate implementations of the Random Forests. Scikit-learn gave the best metric performance so we planned to stick with it.
The third solution was a High Performance Computing option. Python stores all conceivably useful information in its model file. A minimal model file can be constructed by parsing the model object before writing it out, saving only the relevant data. We can also store fields using minimal size representation for example, thresholds, values as floats (half the size of doubles), feature IDs as short integers (half the size of integers; node IDs as integers. We used zstd multithreaded compression instead of gzip compression (everywhere!)
We tested the HPC solution and found an order of magnitude improvement. Concretely for mangoes, the model size went from 12.58 GB in Python to 1.02 GB in C++; and the scoring time went from 30 minutes to 27 seconds.
Pause for quiet reflection.
We concluded that Random Forests are a viable, scalable option to forecast demand for high velocity items with broad seasonal trends that don’t have a lot of spiky behavior e.g. apples, mangoes, ginger and garlic. We are currently in the process of extending our work to categories in produce and testing the model on several other departments!
Thanks to John Bowman for technical leadership, HPC work; Ritesh Agarwal for HPC work; to Amir Motaei for technical feedback; Anton Bubna-Litic for feature engineering work; Abhishek Kumar for ML engineering support; and Abhinav Prateek for product support.
I have a Ph.D. in experimental nuclear physics from The College of William and Mary, was a postdoctoral associate at Duke University and Physics faculty at Santa Clara University where I modeled petabytes of scattering data to understand the structure of the tiny building blocks of the known universe. I am an Insight Data Science Fellow, an Aspen scholar and love working on machine learning at scale!