How to Apply Machine Learning in Demand Forecasting for Retail?

Apr 8 · 10 min read
Image for post
Image for post

Author: Liudmyla Taranenko,
Data Science Engineer at MobiDev

My university professor once asked: “Who would agree with the statement that the only thing math can’t calculate… is human behavior?” I don’t remember what his scientific answer was. What I know for sure now is that human behavior could be predicted with data science and machine learning. In demand forecasting, we take a look at human behavior-not from a human perspective, but from sales data.

In this article, I want to show how machine learning approaches can help with customer demand forecasting. Since I have experience in building forecasting models for retail field products, I’ll use a retail business as an example.

What Is Demand Forecasting in Machine Learning?

Machine learning techniques allow predicting the amount of products/services to be purchased during a defined future period. In this case, a software system can learn from data for improved analysis. Compared to traditional demand forecasting methods, machine learning:

With the arrival of artificial intelligence and machine learning, most businesses are looking to automate processes and use big data to implement AI. A 2019 report by Research and Markets’ report says, in 2018 AI in retail was $720.0 million and is predicted to reach a CAGR of 35.4% during 2019–2024.

As for technology trends in retail sphere, demand forecasting is often aimed to improve the following processes:

Design Algorithm for ML-Based Demand Forecasting Solutions

When initiating the demand forecasting feature development, it’s recommended to understand the workflow of ML modeling. This offers a data-driven roadmap on how to optimize the development process.

Let’s review the process of how we approach ML demand forecasting tasks.

Step 1. Brief Data Review

The first task when initiating the demand forecasting project is to provide the client with meaningful insights. The process includes the following steps:

In my experience, a few days is enough to understand the current situation and outline possible solutions.

Step 2. Setting Business Goals and Success Metrics

This stage establishes the client’s highlights of business aims and additional conditions to be taken into account. Our team provides data science consulting to combine it with the client’s business vision. The goal is to achieve something similar to:

“I want to integrate the demand forecasting feature so to forecast sales and plan marketing campaigns.”

Success metrics offer a clear definition of what is “valuable” within demand forecasting. A typical message might state:

“I need such machine learning solution that predicts demand for […] products, for the next [week/month/a half-a-year/year], with […]% accuracy.”

These points will help you to identify what your success metrics look like. You will want to consider the following:

Product Type/Categories
What types of products/product categories will you forecast? Different products/services have different demand forecasting outputs. For example, the demand forecast for perishable products and subscription services coming at the same time each month will likely be different.

Time Frame
What is the length of time for the demand forecast?

Short-term forecasts are commonly done for less than 12 months — 1 week/1 month/6 month. These forecasts may have the following purposes:

Long-term forecasts are completed for periods longer than a year. The purpose of long-term forecasts may include the following:

What is the minimum required percentage of demand forecast accuracy for making informed decisions? Implementing retail software development projects, we were able to reach an average accuracy level of 95.96% for positions with enough data. The minimum required forecast accuracy level is set depending on your business goals.

The example of metrics to measure the forecast accuracy are MAPE (Mean Absolute Percentage Error), MAE (Mean Absolute Error) or custom metrics.

Step 3. Data Preparation & Understanding

Regardless of what we’d like to predict, data quality is a critical component of an accurate demand forecast. This following data could be used for building forecasting models:

Image for post
Image for post

Data Quality Parameters
When building a forecasting model, the data is evaluated according to the following parameters:

In reality, the data collected by companies often isn’t ideal. This data usually needs to be cleaned, analyzed for gaps and anomalies, checked for relevance, and restored. When developing POS applications for our retail clients, we use data preparation techniques that allow us to achieve higher data quality.

Once the data was cleaned, generated, and checked for relevance, we structure it into a comprehensive form. Below, you can see an example of the minimum required processed data set for demand forecasting:

Image for post
Image for post
Image for post
Image for post

Data understanding is the next task once preparation and structuring are completed. It’s not modeling yet but an excellent way to understand data by visualization. Above you can see how we visualized the data understanding process.

Step 4. Machine Learning Models Development

There are no “one-size-fits-all” forecasting algorithms. Often, demand forecasting features consist of several machine learning approaches. The choice of machine learning models depends on several factors, such as business goal, data type, data amount and quality, forecasting period, etc.

Here I describe those machine learning approaches when applied to our retail clients. But if you have already read some articles about demand forecasting, you might discover that these approaches work for most demand forecasting cases.

Time Series Approach
This involves processed data points that occur over a specific time that are used to predict the future. Time series is a sequence of data points taken at successive, equally-spaced points in time. The major components to analyze are: trends, seasonality, irregularity, cyclicity.

The analysis algorithm involves the use of historical data to forecast future demand. That historical data includes trends, cyclical fluctuations, seasonality, and behavior patterns.

In the retail field, the most applicable time series models are the following:

1. ARIMA (auto-regressive integrated moving average) models aim to describe the auto-correlations in the time series data. When planning short-term forecasts, ARIMA can make accurate predictions. By providing forecasted values for user-specified periods, it clearly shows results for demand, sales, planning, and production.

2. SARIMA (Seasonal Autoregressive Integrated Moving Average) models are the extension of the ARIMA model that supports uni-variate time series data involving backshifts of the seasonal period.

3. Exponential Smoothing models generate forecasts by using weighted averages of past observations to predict new values. The essence of these models is in combining Error, Trend, and Seasonal components into a smooth calculation.

Image for post
Image for post

Let’s say you want to forecast demand for vegetables in the next month. For a time series approach, you require historical sale transaction data for at least the previous three months. If you have historical data about seasonal products — vegetables in our case — the best choice will be the SARIMA model. The forecast error, in that case, may be around 10–15%.

Linear Regression Approach
Linear regression is a statistical method for predicting future values from past values. It can help determine underlying trends and deal with cases involving overstated prices.

Image for post
Image for post

This regression type allows you to:

Let’s say you want to calculate the demand for tomatoes based on their cost. Assuming that tomatoes grow in the summer and the price is lower because of high tomato quantity, the demand indicator will increase by July and decrease by December.

The information required for such type forecasting is historical transaction data, additional information about specific products (tomatoes in our case), discounts, average market cost, the amount in stock, etc. The forecast error may be 5–15%.

Feature Engineering
Feature engineering is the use of domain knowledge data and the creation of features that make machine learning models predict more accurately. It enables a deeper understanding of data and more valuable insights.

Image for post
Image for post

Since feature engineering is creating new features according to business goals, this approach is applicable in any situation where standard methods fail to add value. In ML modeling, a data scientist builds new features from existing ones to achieve higher forecast accuracy or to get new data.

Random Forest
The basic idea behind the random forest model is a decision tree. The decision tree approach is a data mining technique used for data forecasting and classification. The decision tree method itself does not have any conceptual understanding of the problem. It learns from the data we provide it.

Random forest is the more advanced approach that makes multiple decision trees and merges them together. By taking an average of all individual decision tree estimates, the random forest model results in more reliable forecasts.

Random forest can be used for both classification and regression tasks, but it also has limitations. The model may be too slow for real-time predictions when analyzing a large number of trees.

If you have no information other than the quantity data about product sales, this method may not be as valuable. In such cases, the time series approach is superior.

Image for post
Image for post

Step 5. Training & Deployment

Once the forecasting models are developed, it’s time to start the training process. When training forecasting models, data scientists usually use historical data. By processing this data, algorithms provide ready-to-use trained model(s).

This step requires the optimization of the forecasting model parameters to achieve high performance. By using a cross-validation tuning method where the training dataset is split into ten equal parts, data scientists train forecasting models with different sets of hyper-parameters. The goal of this method is to figure out which model has the most accurate forecast.

When researching the best business solutions, data scientists usually develop several machine learning models. Since models show different levels of accuracy, the scientists choose the ones that cover their business needs the best.

The improvement step involves the optimization of analytic results. For example, using model ensemble techniques, it’s possible to reach a more accurate forecast. In that case, the accuracy is calculated by combining the results of multiple forecasting models.

This stage assumes the forecasting model(s) integration into production use. We also recommend setting a pipeline to aggregate new data to use for your next AI features. This can save you a lot of data preparation work in future projects. Doing this also increases the accuracy and variety of what you could be able to forecast.

Anomalies in Demand Forecasting Systems

When integrating demand forecasting systems, it’s important to understand that they are vulnerable to anomalies. The real example of such an anomaly is Coronavirus.

As the demand forecasting model processes historical data, it can’t know that the demand has radically changed. For example, if last year we had one demand indicator for medical face masks and antiviral drugs, this year it would be completely different.

In that case, there might be a few ways to get the accurate forecast:

1. Wait some time until gathering the data about new market behavior, and only after that develop a demand forecasting model from scratch.

2. Apply feature engineering approach by adding such data like news, a current market state, price index, exchange rates, and other economic factors.

So, what did we learn? Machine learning is not limited to demand forecasting. The future potential of this technology depends on how well we take advantage of it.

As a 17-year-old student, I never knew that math and statistics applied to so many complex solutions. Today, I work on demand forecasting technology and understand what added value it can deliver to modern businesses as a one of the emerging ML trends.

Full article originally published at

The Startup

Medium's largest active publication, followed by +717K people. Follow to join our community.


Written by


We create complex business-driven solutions, with a focus on innovation. We write about trends and expertise in AI, IoT, AR & more.

The Startup

Medium's largest active publication, followed by +717K people. Follow to join our community.


Written by


We create complex business-driven solutions, with a focus on innovation. We write about trends and expertise in AI, IoT, AR & more.

The Startup

Medium's largest active publication, followed by +717K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store