https://thenounproject.com/search/?q=forecasting&i=2500242

Building a Machine Learning based demand forecasting platform

Published in

Walmart Global Tech Blog

9 min readOct 8, 2020

Authors: Sasanka Katta, Abhinav Prateek

The below write-up is a recount on how our team at Walmart Global Tech has built a deeply innovative & scalable platform to improve Walmart’s ability to predict customer demand and help drive operational efficiencies.

1. Introduction

“Save people money so they can live better.”

Walmart’s mission statement drives our business practices. In 2018, with this goal in mind, we set out to tackle one of our most critical problems: reduce perishable waste and optimize our inventory to serve our customer better.

To achieve this result, we needed to enhance our ability to forecast our customers’ demands and ensure that the right products are at the right places at the right time. We did this by:

a) Improving accuracy of our forecasting models

b) Providing the right visibility and flexibility to our demand management stakeholders. Equipped with informed data, this group was able to eliminate biases, minimize manual touches, and make better decisions for item replenishment.

2. Understanding our legacy system and identifying areas of opportunity

Our product team began by interviewing our stakeholders and analyzing data to identify areas where the legacy system was underserving the world’s largest supply chain. We identified multiple use cases that weren’t working under our legacy forecasting system. Outlined below are a few that were particularly interesting:

I. First, the legacy system was built on exponential smoothing models. The models work by replicating seasonal patterns from previous years, predominantly from the year before. This approach is flawed since demand factors change every year. For example, Easter moves by a few weeks based on the Metonic cycle. In fact, even for static events such as Christmas and July 4, there is fluctuation depending on which day of the week they fall. Faulty predictions for these events often result in wastage. The below picture demonstrates this.

Picture 1: Walmart week starts on a Saturday. It can be seen how peak demand for Russet Potatoes happens around weeks of Christmas based on which day of the week the event occurs.

II. Macro and micro factors impact the demand. Payroll calendars, Supplemental Nutritional Assistance Program (SNAP), price & promotions, and weather affect demand.

Picture 2: The below chart shows the demand pattern for ‘Similac Advanced Baby Formula’ and it can be seen that demand spikes every 4–5 weeks depending on Payroll calendars and SNAP payouts.

III. The legacy system forecasted aggregate demand across all U.S. stores and allocated it to individual stores by calculating the market shares based on sales at the individual stores in the recent weeks. This process suffers disadvantages since local events and preferences are mostly seasonal. For example, Chayote squash is an extremely popular dish in New Orleans during Thanksgiving but has no spike in demand in other parts of the country. Hence, the legacy system under-forecasted in New Orleans and over-forecasts for rest of the U.S., causing shortage in New Orleans and wastage elsewhere.

Picture 3: The below chart shows the relative demand spikes during Thanksgiving for Chayote Squash for New Orleans compared to rest of U.S.

3. Determining success metrics

As described above, reducing the waste while keeping high in-stock levels to drive operational efficiencies was our goal. It’s obvious why better prediction of customer demand will reduce the waste and keep inventory position optimal. However, to see if we were actually successful, we needed to determine success metrics that were not just about tracking our inventory position. For instance, our prediction models could have improved, but our replenishment practices keep wastage high. Alternatively, our replenishment practices could have improved and brought down waste, but our customer demand predictions have not.

Therefore, we decided to track waste and In-stock as a primary metrics and forecast accuracy as a secondary metric.

Primary Goal Metrics —

Waste (in $) was measured and trends were compared to Last Year

In-stock % at a weekly level was measured by capturing in-stock levels for every item & store combination over different time snapshots over the week. In-stock trends were also compared to last year’s to track improvement.

Proxy Metric — Forecast accuracy. Measured by closeness of forecast to actual demand realized for every item & store combination and aggregated to higher hierarchies by assigning a sales-based weight

4. Algorithm development and validation

After exploratory data analysis and presenting the use cases to our Data Science team, we realized that there could not be one model which could accurately forecast for Walmart’s wide assortment. We created separate tracks to develop models which would solve for specific use cases — fast moving items, long-tail items, event specific seasonality, new items, and other very specialized use cases. We worked with the Data Science team to help them understand the use cases in further detail, provided them with specific examples to solve for, and ensured they had access to pertinent data. Our data science team developed algorithms based on Gradient Boosting Machines, State Space models, Random Forests, and hierarchical techniques (for long-tail items). There were also a few special purpose algorithms which were based on GLM, and Regression techniques.

We used the backtesting process to evaluate performance for all models. This process helped us to a) provide feedback to our data science team with any further areas of opportunities b) to establish our model’s ability to predict demand better than those of legacy systems. By running the backtesting process for 52 weeks, we saw ~300 bps of improvement in the accuracy, which is a significant improvement in our ability to predict customer demand. We packaged the results into neat visualizations to our stakeholders to have them engaged. That way, we ensured our algorithms leverage deep supply chain understanding of our stakeholders.

Picture 3: A Typical model development lifecycle we followed

5. Application UI for our stakeholders

Demand Planning team at Walmart are responsible for demand forecasts and work with suppliers and buyers to maintain optimal item availability for our customers. While focussing on our ability to predict customer demand is a priority, it is equally important to provide an efficient UI for our Demand Planning team. Below were some of the key considerations we had while we were building the UI:

I. An application interface with all the necessary metrics and information for the demand planning team to drive data driven conversations and actions.

II. An interface keeps track of our metrics (waste, in-stock, and forecast accuracy).

III. Appropriate diagnostic tools for our demand planning team to help them identify areas of concerns and provide them the ability to deep dive to understand the root-cause.

IV. An interface which provides appropriate ability (not too high or too low) to modify forecasts, and also measures their touches— whether they are value-added or not

6. Tech stack and other considerations along the adoption

Tech stack: As our engineering team productionize our forecasting solution, we agreed on few key considerations to ensure we had a flexible platform:

I. The platform should be horizontally scalable — if future requirements need new models or increased workload, we could do it by adding more compute power and increasing storage capacities as opposed to upgrading infra on a routine basis.

II. Provide flexibility for our Data Science team to select any runtime environment such as R, Python, C++ etc. — our engineering team chose container based orchestration(docker +Kubernetes) for our ML pipelines, eliminating runtime constraints for our data scientists.

III. Flexibility to expand to other Walmart markets with ease and limited maintenance. Our engineering developed a multi-tenant infra model which helped us to expand to other markets without increased maintenance and with affordable cost structure.

More details about our tech stack in this engineering blog from our architect — https://medium.com/walmartglobaltech/demand-forecasting-tech-stack-walmart-539d17f385db

Launch departments: We narrowed down Meat and Produce departments across US stores for the initial launch. Below are the factors which drove the decision:

I. The two departments would benefit the most by optimizing inventory position (waste savings and in-stock levels).

II. The customer demand for items in the two departments depend on complex factors which could be modeled accurately by our Machine Learning models.

III. Stakeholders for these departments spend significant time managing the inaccurate forecasts, making this area a bang for the buck.

When it came to optimize for MVP, we deprioritized several use cases.

I. Price & promotions features were not a consideration for Meat and Produce departments since they are not on frequent promotions. In general Walmart US operates on an EDLP model (every-day-low-pricing), making price and promotions less important.We eventually included price & promotions features to solve for our high-low international markets in Aug 2019.

II. Deprioritized models specifically designed for long-tail items for initial launch since our initial departments were fast moving. We eventually launched these models to help departments with long-tail items in Apr 2019.

III. Cloud native vs On-prem infrastructure: Cloud native infrastructure would provide us the scalable platform we were looking for, but we went with On-prem for initial launch because of our familiarity with systems. Also, a quick and successful POC on-prem would get us the support and resources to set up the scalable platform on cloud. The successful POC did indeed help us to migrate our infrastructure to cloud in 2019, enabling us to further expand our scope.

7. Results and product expansion

I. In June 2018, we launched the product for two departments for U.S. stores. The accuracy improvements were in sync with those we observed in backtesting.

II. By July 2020, all of Walmart U.S. key departments adopted our forecasting solution.

III. Our success in US market has raised interest among international supply chain teams. We extended our solution to Canada market in Feb 2019, and subsequently extended it to Mexico and UK. We plan to extend our solution to other Walmart operated international markets.

a. International markets have seen ~500 bps of accuracy improvements, higher returns compared to improvement in U.S. market.

IV. In the application UI, we measured manual touches by stakeholders to the machine generated forecasts and worked with them to eliminate touches that degrade forecasts, making the forecast management process at Walmart even more data driven.

8. Key learnings for our product team

Develop intuition for algorithms & technology: To have a good understanding and intuition about the algorithms is an invaluable asset to the Product Managers, especially those working on algorithm driven products. Many times our team was able to listen to a potential data science solution and gauge whether the solution could solve for a specific use case or not. Similarly, we were also able to foresee any potential issues with the solve or whether the algorithm can be scaled or put into production easily. This intuition helped our team to have rich conversations with our Data Science team and increased our ability to collaborate. Similar understanding of engineering solutions will help Product Managers to collaborate more efficiently with engineering team.

Create transparent feedback loop with management & stakeholders: While our goal was to reduce waste by improving forecast accuracy, deconstructing the problem qualitatively into presentable use cases ensured a great feedback loop with business stakeholders & management. We were able to help them visualize why accuracy increased as opposed to making the process a ‘Machine Learning Blackbox’.

Start narrow and grow wide: It is very essential for Product teams to start by focusing on a strong product-market fit, in our case Meat and Produce departments. This ensures that team is solving a well-defined problem to attract a small set of initial users. But eventually PMs should think about expansion as the traction gains — either to more user groups, new geographies, or functional areas. A good prioritization around the problems to solve and the user groups to focus on is all-important. One another consideration when building the initial platform is future scalability by considering potential expansion of the product. We were able to expand our product to different markets and functional areas of Walmart quickly because our initial tech stack was built to scale.

Data-driven approach to override biases: Data makes conversations fair, logical, and eliminates biases. We were able to eliminate a lot of established demand management practices which were degrading forecast quality, only because our teams’ narratives were firmly backed by data.

I am a Group Product Manager at Walmart Global Tech and lead product teams for Demand forecasting and Flow forecasting. I have ~10 years of experience in shipping Data Science based solutions and an MBA from Tuck School of Business at Dartmouth.

Abhinav Prateek is a Senior Product Manager at Walmart Global Tech and leads the efforts for Demand forecasting. He has ~6 years of experience in Product Management and an MBA from Wharton Business School of the University of Pennsylvania.

Building a Machine Learning based demand forecasting platform

Written by Sasanka Katta