Demand Forecasting in Retail — a Probabilistic, Multi-product and Multi-horizon Problem

Published in

JD Technology Blog

5 min readDec 14, 2018

Demand forecasting is critical to the success of an e-commerce company. In particular, as JD operates its own supply chain, demand forecast at different granularity levels drives decision-making along the entire chain. For example, to live up to our fast delivery promise, we leverage our forecast of the local demand to stock popular products closer to customers (read more in this previous post). A good forecast can directly improve the decision making ability when answering questions like what products to choose to sell, how many to buy from suppliers, when to buy, and where to store the products. In this post, we introduce one approach for demand forecasting that is specifically designed to support inventory management.

The challenges are threefold. First, in inventory management, we need to balance the risk between running out of stock and keeping too much inventory. A point estimation of the mean demand is insufficient because the opportunity cost of losing sales is much more costly than that of holding extra inventory. We need more information to account for the demand uncertainties when placing an inventory order. As a result, we decide to use quantile regression to capture the distribution of the demand.

As shown in the quantile loss formula below, the loss function consists of two parts: the penalty when the predicted demand is smaller than the true demand (first term), and the penalty when the predicted demand is greater than the true demand (second term). The larger the desired quantile (q), the greater penalty its loss function places on underestimating the true demand, implying a higher opportunity cost of losing sales.

The second challenge arises from the need to quickly learn from ever-changing market dynamics and separate trends from noise. We leverage JD’s gigantic and comprehensive data pool to train a multi-product forecast model in order to identify complicated relationships across products, locations, product categories, etc.

The third challenge is due to the uncertainties in vendor delivery time. When placing an inventory order, we assume the inventory will arrive at our warehouse in a certain amount of time (can vary across vendors). We refer to this as the vendor lead time. When determining how much inventory to order from the vendor, we need to consider the accumulative demand over the vendor lead time. Thus our demand forecasting needs to be flexible in predicting the demand for a certain period of time in the future, i.e., a multi-horizon forecast model.

At a high level, we consider the demand forecasting problem as a supervised learning problem where we use a feature vector to X_i predict the product demand y_i for a given quantile q and a desired look-ahead window L for each product i. The feature vector X_i is composed of trend-related features derived from historical sales and static features related to the product, such as its vendor, brand, store location, etc. Next, we will introduce the four key modules of our demand forecasting system, which generates probabilistic, multi-product and multi-horizon predictions.

Figure 1: Overview of the Demand Forecasting System

Module 1: Product Clustering

We first cluster products by their sales patterns for better modeling accuracy. For example, common sales patterns may include: new products with few sales data points available, seasonal products with strong seasonal variability, and slow-moving products with a sporadic sales pattern.

Module 2: Feature Processing

For each cluster, we extract tailored features from the time series data, such as sales, inventory, and price, and concatenate them with static data, such as product attributes and geographic location to build a feature pool. The feature extraction from the time series is essential as it helps the forecasting system absorb the useful information from the demand patterns. For example, to describe a potential sales peak, we may calculate the increase of the average sales, the standard deviations of the sales, and the gap between the max and min sales during a period of time, and then add them to the feature pool.

Modules 3: Data Augmentation

Data augmentation is a common way to improve the forecast accuracy in model training. In our forecasting system, we enrich our data by 1) selecting multiple focal time points to collect data and 2) jumping around the focal time points to generate more training samples. For example, as shown in Figure 2, if we try to predict the total sales during the new year’s week (January 1st to 7th) based on our observation in December, we can select the exact same period in the past year as the focal time point or the most recent period, say one month ago. Then based on the focal point, we can create a sliding window to jump around the focal time points so as to build several training samples for each focal time point (Figure 3).

Figure 2: Selecting Multiple Focal Time Points for Collecting Training Samples

Figure 3: Jumping Around Focal Time Points to Generate More Training Samples

Module 4: Model Training

Now we are ready to fit the training data to a regression model. The regression model can be a linear regressor, a tree-based model such as random forest or gradient boosting trees, or a forward neural network model. Some of the models are easy to incorporate the quantile loss objective function (e.g. GradientBoostingRegressor in lightGBM). Depending on which model is adopted, a hyper parameter tuning process is conducted to improve the fitting of the model to the data. Methods like Grid Search or Bayesian Search are commonly adopted in this process. Our forecasting system ultimately generates a lookup table for each product at each location for a range of predetermined look-ahead windows and quantiles. The forecast results are then fed into various systems to support better decision making for inventory management, marketing campaigns, labor planning, etc.

We hope this post is helpful for you to have an overall idea about how demand forecasting is done in a typical e-commerce retailing setting. There are many related topics which we do not have the space to cover in this post but we will discuss them in the future, e.g. how to fill the missing sales data. Stay tuned for future posts.

We are open to all kinds of academic research collaborations. Please contact jd_tech_blog@jd.com if you are interested.

Demand Forecasting in Retail — a Probabilistic, Multi-product and Multi-horizon Problem

Written by JD Smart Supply Chain