Demand Forecasting using R

Ashwarya Ashiya
Brillio Data Science
4 min readNov 19, 2019

Demand Forecasting refers to the process of predicting the future demand for the company’s products and channels to cater customers effectively.

The ever changing world is characterized by risk and uncertainty, and most of the business decisions are taken under this scenario. Through demand forecasting companies mitigate risk through efficacious planning for scenarios pertaining to excess inventory and lost opportunity.

Predicting the future demand for a product helps the organization in making decisions in one of the following areas:

  • Planning and scheduling the production and changing strategies dynamically
  • Formulating a pricing and promotional strategy.
  • Planning advertisement and its implementation.
  • Hiring employees to cater customer demand

Demand forecasting has great significance in the businesses where large-scale production is involved. Since the large-scale production requires a long gestation period, a good deal of forward planning should be done.

Business Case

Let’s look at a demand forecasting real time scenario using a business case of forecasting weekly website visits for a certain ecommerce retailer. To drive customers on website, the marketing team of ecommerce retailer makes investments in marketing certain channels. Historically the investment data is available for us, for future weeks if data isn’t available it needs to be extrapolated.

Data with External Variables (captured internally)

Data Dictionary

Forecasting Method : ARIMAX

The ARIMA (auto-regressive integrated moving average) model makes forecasts based only on the historical values of the forecasting variable. The model assumes that the future values of a variable linearly depend on its past values, as well as on the values of past (stochastic) shocks. An Auto-regressive Integrated Moving Average with Explanatory Variable (ARIMAX) model can be viewed as a multiple regression model with one or more auto-regressive (AR) terms and/or one or more moving average (MA) terms. This method is suitable for forecasting when data is stationary/non stationary, and multivariate with any type of patterns in the data viz. level/trend /seasonality/cyclicity.

ARIMAX is related to the ARIMA technique but, while ARIMA is suitable for datasets that are univariate. ARIMAX is suitable for analysis where there are additional exogenous variables usually in numeric format.

In this article, we are trying to gauge customer demand through website visits. Henceforth, time series of website visit variable is created and forecasted. The daily level data is aggregated at week level and then forecasting is done at a week level for upcoming quarter weeks.

The external variables which are available are investment made in marketing channels such as online advertising, social media etc which lead to customers visiting website and buying products. The promotional heat captures the intensity of relative intensity of promotions at week level.

Dummy variables are created to capture spikes and troughs happening on seasonal events such as Black Friday and Christmas

Data Preparation

Assumption : If we have the data for investments made then we use it, otherwise through phasing the future data is extrapolated through below steps.

Budget/spend calculation for forecast period.

  1. To calculate budget distribution, we will be taking two series into consideration.
  2. Actual spend distribution of same quarter last FY.
  3. We also need total budget allocation for the upcoming quarter.
  4. Calculate share of spend for entire quarter of variables for last quarter.
  5. Take the total sum of individual spends.
  6. Convert it in the percentage by dividing it with total spend for that particular quarter.
  7. Calculate the weekly percentage distribution of spend for each spend variable for the same quarter of last FY.
  8. As there are 13 data points for one quarter, we need to divide spend for that particular variable in 13 parts equally for all the weeks.
  9. Use the same percentage distribution for finding the numerical values of weekly spend for upcoming quarter.

Modelling Steps

Include the necessary forecast and data manipulation libraries

library(forecast)
library(dplyr)

Creation of train and test data splits for the model

mydata<-mydata[1:143,]
mytest<-mydatafull[144:169,]
Visitss = ts(mydata$Visits.TT, frequency=52)

Plotting of ACF and PACF graph of the time series

withoutblank<-(complete.cases(Visitss))
ns <-ndiffs(withoutblank, max.d=2)
tsdisplay(Visitss)

ADF test for the stationarity of the time series

adf.test(Visitss)

Regressor Standardization of the budget and other predictor variables to make them unit free

reg_data<-scale(mydata)
reg_data[is.na(reg_data)]<-0
reg_data<-as.data.frame(reg_data)
reg_test<-scale(mytest)
reg_test[is.na(reg_test)]<-0
reg_test<-as.data.frame(reg_test)

Calling the ARIMAX function with desired variables according to significant impact on the time series

fit<-auto.arima(Visitss,xreg=reg_data[reg_list])
predictc<-forecast(fit,xreg=reg_test[reg_list])
predicted<-as.data.frame(predictc)

MAPE calculation and finding the best fitted model

mapecal<-mape(mytest$`Visits.TT`[1:7],predicted$`PointForecast`[1:7])mapecal

ACF and PACF graphs creation of the residuals

arima_Con_residule <- residuals(fit)
tsdisplay(arima_Con_residule)

Ljung-Box test for the normality of the residuals

Box.test(arima_Con_residule, lag = 16,fitdf= 1 ,type =”Ljung”)

Forecast Result

Forecasts compared with last year actuals in same time frame

Accuracy Calculation

The metric used for measuring accuracy is mean absolute percentage error (mape). Accuracy would be (100-mape). Below is the mape function:

mape<-function(actual,predicted){
return(mean(abs((actual-predicted)/actual)*100)) }

Complete Code

Other Forecasting Techniques

Some other forecasting techniques which can be alternatively used are XGBoost and RNNs.

RNNs : It requires large amount of data and time for optimum learning and the results are not fully inferential in terms of confidence intervals.

XGBoost : Tends to give results which are smoothened thereby reducing error but it doesn’t capture peaks and troughs convincingly.

References

https://www.r-bloggers.com/forecasting-arimax-model-exercises-part-5/

--

--