Predicting the Demand of Products Sold Online

Govind Chandrasekhar
The Ecommerce Intelligencer
4 min readFeb 6, 2020


Can the demand for an item be predicted before its market launch?

How about 1 month into launch, once the initial numbers are in?

Which signals best prophesy future demand? Can these be controlled?

We set out to answer the questions above using the ~10 billion price & demand signals in our database, and our Universal Product Catalog. Our approach was to build a forecasting model tuned to predict publicly available sales metrics such as sales rank.

Here are the signals that we used:

  • Descriptive product features, such as size, color, brand, weight, occasion, country of origin, generated by our Attribute Extraction & Normalization Engine.
  • Price change features, capturing the dynamics of how price changes over time and correlates with demand.
  • Performance and supply centric features, including number of sellers, evolution of star ratings over time and number of reviews.

On an average, the model was able to predict salesrank percentile with <1% error rate when historical data was provided, and ~5% for newly launched products with no prior history. In other words, even before a product is launched, based just on metadata, the product’s sales rank can be forecasted to a reasonable degree, with >95% accuracy.

How was the model able to make such predictions? And what did it learn in the process? We probed the model to understand the patterns that it had discovered, with the hope of unearthing hidden insights. With a focus on the footwear category, here’s what we found.

Brand Sensitivity vs Price Sensitivity

First up, we looked to determine which categorical and continuous product features have the strongest magnitude impact on output.

Prominently, brand plays a significant factor in determining sales, much more so than price does. In other words, ecommerce footwear consumers are more brand sensitive than they are price sensitive — footwear sold online is most certainly not a commodity good.

Social Proof Swings Both Ways

Next, we looked at the interplay between siterating (star rating, scaled to a range of 0–10) and reviews_number (number of reviews of the product).

In the graph below, a positive SHAP value (look to the y-axis on the left side) indicates a negative impact on sales (increases salesrank).

The key insight here is that for a product with medium rating (between 7.2 and 8.5 marked by the light blue box), having a large number of reviews (indicated by red tips at the top) has a negative impact on sales (indicated by SHAP values >0). However, for a product with a high rating (>8.5, marked by the light green box), this pattern inverts (red tips at the bottom now) — having a large number of reviews has a positive impact on sales.

The interpretation here is that as long as there isn’t a large amount of social proof (low number of reviews), a product’s rating doesn’t have an overwhelming effect on sales … with the exception of the products with a perfect rating (represented by the rightmost bar, where the blue section stretches deep into negative SHAP territory). Once some amount of social proof builds up though (large number of reviews), the rating begins to influence purchase behavior, either negatively or positively depending on which side of the 8.5 rating threshold the product falls.

Supply Constraints

Similarly, we looked at the interplay between recentoffers_count (number of sellers of a product on marketplaces) and reviews_number. Once again, note that a positive SHAP value (look to the y-axis on the left side) indicates a negative impact on sales.

In this graph, the band with recentoffers_count between 2.5 and 10.0 (marked by the light blue box) reveals an interesting pattern. Intuitively, we’d expect a large number of reviews (red tips) to have a positive impact on sales (negative SHAP value). For the band highlighted though, the effect is the opposite — highly reviewed products have poorer sales!

A possible explanation for this phenomenon is that products with red tips in the anomalous band are supply constrained (low recentoffers_count), possibly artificially if the market is controlled by a few players. When the supply curve shifts to the left, the demand (quantity of sales) drops.

Effect of Manufacturing Source

We also looked into the impact of specific attribute values — country of origin — in this example.

In magnitude terms, the impact of country of origin on sales is limited. But when there is an effect, products made domestically or source from high GDP per capita countries like Andorra and Australia, tended to have a more favorable impact on sales.

These are just some of the insights generated by our demand forecasting models for one broad niche, footwear. Given a more specific niche, including subcategories and brand groups, these models can provide fine-tuned insights on which products are likely to do well, and how their sales performance can be optimized.

Demand Forecasting is a component of our beta suite of data science products for brands and logistics companies. If you’d like to learn more, click here!