Data Science In Walmart Supply Chain Technology

Mingang Fu
Nov 1, 2018 · 8 min read

What is Data Science?

The generally agreed upon definition of Data Science is that it is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract useful knowledge and valuable insights from data in various forms, both structured and unstructured. As it is in all the retail industry, this definition is mostly applicable at Walmart. However, at Walmart we go a little further. Data science methodology is being evolved and developed into a competitive advantage in the market place. At Walmart, data science is a game changer.

How is data science applied at Walmart Labs?

Walmart is experiencing tremendous digital growth as the world’s largest retailer and now, more than ever, requires the incisive application of data science. At Walmart Labs, data scientists are focused on building algorithms that power the efficiency and effectiveness of complex supply chain management processes. We are successfully solving both classic and new problems.

In which specific areas of supply chain management is data science applied?

At Walmart Labs, data science methodology is being actively applied in various supply chain arenas including, but not limited to sourcing, order/shipment preparation, transportation, last mile routing/scheduling, and last mile order pick up.


A primary area where data science is applied at Walmart is sourcing-related problems. Specifically, these include delivery promising and order sourcing.

Delivery Promising
Daily, millions of customers view millions of items at For each item viewed, Walmart provides the customer with a real-time estimated delivery date if the item is purchased by a given future time stamp. An algorithm in the backend is doing the estimate based on many facts including:

· The distance between the customer and fulfillment centers (FC)

· Inventory levels of the item at the FC

· Available shipping methods and capacity (to some extent shipping costs are also taken into consideration)

The major challenge in delivery promising lies in expected response time requirements which is in the milliseconds.

Order Sourcing
Whenever an order is placed, the supply chain management systems then needs to determine the following:

· Which FC is the optimum one from which to fulfill the order or a portion of the order,

· Which carrier method to choose to minimize transportation cost while still meeting promised delivery date (the algorithms take past on time delivery performance into consideration)

Opportunity costs may incur if we solely rely on a near-sighted strategy focused solely on the bottom-line.

Order/Shipment Preparation

In order/shipment preparation, there are two significant problems being addressed using data science. These are Picking Optimization and Packing Optimization.

Picking Optimization — Picker Routes Optimization (VRP or VRPTW)
Whenever an order or a portion of an order is designated to be fulfilled by a specified FC (or Store for grocery and Ship From Store (SFS)) with a given promised delivery, the items in the order will need to be picked from shelves in a timely manner. For each ordered item, the SCM system needs to determine the following:

· Which picker the item should be assigned to

· For a given picker, the optimal sequence of items to be picked in order to minimize total walking distance and to maximize the pickers’ productivity

This problem will be familiar to experienced data scientists. The item to picker assignment, and picking sequencing of items assigned to each picker naturally forms the problem commonly known as the Vehicle Routing Problem with Time Windows (VRP or VRPTW), which is a classic NP-Hard problem. NP-hardness (non-deterministic polynomial-time hardness) in computational complexity theory is the defining property of a a class of problems that are informally, at least as hard as the hardest problems in NP.

Packing Optimization — Box Recommendation (Bin Packing Problem)
Whenever items of an order or multiple orders placed by the same customer are picked from shelf, and are ready for packing, Walmart has developed a box recommendation system that determines the best-sized box which can hold all the ordered items with a minimum of in-box space wasted. This problem is naturally modeled as the Bin Packing Problem, which is another classic NP-Hard problem familiar to data scientists.

A set of state-of-art heuristics are implemented to programmatically determine the best box for each order within a predetermined amount of time. Depending on different attributes of an order, some heuristics perform significantly better than others. Going further, Walmart applies a deep learning approach to train neural network-based classifier to determine best heuristic to use for each order box determination.


Another supply chain management area where Walmart has successfully applied data science methodology is transportation.

Lane Planning (Mix Integer Programming Problem)
After a shipment has been picked and packed, a shipping label (as determined by Sourcing) is generated. Based on the information on the label, associates sort the package and place it close to outbound dock door.

A lane is defined according to the Source Facility, Carrier, Ship Method, and sometimes destination Facility. Millions of packages are processed daily, with corresponding as-promised delivery dates and delivery addresses respectively.

The challenge is how to design the transportation network in terms of connecting the different types of facilities and assigning carriers (lane planning). The key is to minimize transportation costs while simultaneously maintaining satisfactory on-time delivery levels.

This problem is modeled as the classic Mixed Integer Programming Problem. Walmart has successfully solved this problem in partnership with a third party, Gurobi Optimization.

Continuous Moves (Combinatorial Optimization Problem)
Empty/Deadhead mileage is a major source of waste in line-haul transportation. To reduce it, a common practice is to have a single driver and truck pickup and deliver multiple truck loads in an optimized sequential manner.

Major challenges lie in in the following:

· Complex business rules that need to be satisfied along with applicable Department of Transportation (DOT) and Department of Labor (DOL) rules and regulations.

· Limited or no time for optimization. Earlier loads in a move might have already been completed while later loads planned in a move have not yet been prepared or materialized.

At its core, this is known as a classic combinatorial optimization problem.

DC to Store Delivery Routing and Scheduling (VRPTW)
Store grocery needs to be replenished daily, from predetermined Distribution Center (DC) or Regional Distribution Center (RDC). A good milk run to accomplish it minimizes total driving distance as well as total empty distance.

Last Mile Routing and Scheduling

Another SCM area where data science methodology is currently being applied in Walmart is in last mile routing and scheduling. Specific areas of application include grocery delivery routing, general merchandise delivery routing, associate delivery routing, map routing, and supply shaping.

Grocery Delivery Routing and Scheduling (VRPTW and Assignment Problem)
Grocery orders are delivered within appointment time windows as specified by customers. Delivering these orders on time with a fleet of specialized vehicles is usually very expensive. To reduce the costs and improve the delivery capacities, ROVR implements a set of state-of-art and proprietary meta-heuristics and local search algorithms.

These are used to optimize the route on a real-time basis, with expectation of new orders, updating/cancellation of existing orders; A lower cost option for grocery delivery in medium and low-density areas is to partner with third party carriers such as Uber and Lyft. In these cases, detailed routes are not of primary interest. Instead, orders-to-vehicles assignment becomes the primary focus.

GM Delivery Routing — Associate Delivery (VRP)

Beyond grocery delivery, Walmart is also looking at General Merchandise (GM) delivery. Two significant areas where data science principals are being applied include GM delivery routing and the implementation of associate delivery systems.

Map Routing (Graph Theory — Shortest Path Problem)
Walmart has determined that classic Dijkstra Algorithms with various heap implements are not scalable for batch location-to-location routing problems in a large road network such as the entire United Kingdom, or the even larger United States. In response, we have developed and deployed Resource Optimization and Vehicle Routing (ROVR). ROVR employs state-of-art technique Contraction Hierarchy to shift the heavy lift computing from real time to pre-computation.

Supply Shaping

Demand shaping is an operational supply chain management (SCM) strategy where a company uses tactics such as price incentives, cost modifications and product substitutions to entice customers to purchase specificitems.

Last Mile Order Pickup

Check In Notifications Engine (CINE) is one of several parts of the overall Wal-Mart Last Mile Systems initiatives to predictively detect when one of our customers is approaching a physical destination (access point) and to take preparative action in response.

Image for post
Image for post

The ability to accurately detect where the customer is without draining their smartphone battery, and to predict when the customer is about to CINE sets CINE apart from other approaches to proximity and location. CINE opens a broad range of uses that would not otherwise work due to the low accuracy and poor reliability of other solutions.

CINE is the lynchpin of Last Mile Systems’ online-to-offline pickup and drop-off experience. CINE alerts Online Grocery (OG) store associates when a customer is near so that the order will be ready the moment the customer pulls up.

Customer Waiting Time Estimate and Control (Queuing Theory)
As an alternative to home delivery, customers can choose to pick up their online grocery orders from stores. Customers’ arrival patterns, associates’ processing times, distribution/prediction all contribute to customers’ waiting time in the queue. Walmart is developing a queuing theory based on to help estimate waiting time of each customer, and to determine staffing level to control customer waiting in the queue.

Image for post
Image for post


Using technology, data and design to change the way the…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store