Demand Forecasting using R
Demand Forecasting refers to the process of predicting the future demand for the company’s products and channels to cater customers effectively.
The ever changing world is characterized by risk and uncertainty, and most of the business decisions are taken under this scenario. Through demand forecasting companies mitigate risk through efficacious planning for scenarios pertaining to excess inventory and lost opportunity.
Predicting the future demand for a product helps the organization in making decisions in one of the following areas:
- Planning and scheduling the production and changing strategies dynamically
- Formulating a pricing and promotional strategy.
- Planning advertisement and its implementation.
- Hiring employees to cater customer demand
Demand forecasting has great significance in the businesses where large-scale production is involved. Since the large-scale production requires a long gestation period, a good deal of forward planning should be done.
Business Case
Let’s look at a demand forecasting real time scenario using a business case of forecasting weekly website visits for a certain ecommerce retailer. To drive customers on website, the marketing team of ecommerce retailer makes investments in marketing certain channels. Historically the investment data is available for us, for future weeks if data isn’t available it needs to be extrapolated.
Data with External Variables (captured internally)
Forecasting Method : ARIMAX
The ARIMA (auto-regressive integrated moving average) model makes forecasts based only on the historical values of the forecasting variable. The model assumes that the future values of a variable linearly depend on its past values, as well as on the values of past (stochastic) shocks. An Auto-regressive Integrated Moving Average with Explanatory Variable (ARIMAX) model can be viewed as a multiple regression model with one or more auto-regressive (AR) terms and/or one or more moving average (MA) terms. This method is suitable for forecasting when data is stationary/non stationary, and multivariate with any type of patterns in the data viz. level/trend /seasonality/cyclicity.
ARIMAX is related to the ARIMA technique but, while ARIMA is suitable for datasets that are univariate. ARIMAX is suitable for analysis where there are additional exogenous variables usually in numeric format.
In this article, we are trying to gauge customer demand through website visits. Henceforth, time series of website visit variable is created and forecasted. The daily level data is aggregated at week level and then forecasting is done at a week level for upcoming quarter weeks.
The external variables which are available are investment made in marketing channels such as online advertising, social media etc which lead to customers visiting website and buying products. The promotional heat captures the intensity of relative intensity of promotions at week level.
Dummy variables are created to capture spikes and troughs happening on seasonal events such as Black Friday and Christmas
Data Preparation
Assumption : If we have the data for investments made then we use it, otherwise through phasing the future data is extrapolated through below steps.
Budget/spend calculation for forecast period.
- To calculate budget distribution, we will be taking two series into consideration.
- Actual spend distribution of same quarter last FY.
- We also need total budget allocation for the upcoming quarter.
- Calculate share of spend for entire quarter of variables for last quarter.
- Take the total sum of individual spends.
- Convert it in the percentage by dividing it with total spend for that particular quarter.
- Calculate the weekly percentage distribution of spend for each spend variable for the same quarter of last FY.
- As there are 13 data points for one quarter, we need to divide spend for that particular variable in 13 parts equally for all the weeks.
- Use the same percentage distribution for finding the numerical values of weekly spend for upcoming quarter.
Modelling Steps
Include the necessary forecast and data manipulation libraries
library(forecast)
library(dplyr)
Creation of train and test data splits for the model
mydata<-mydata[1:143,]
mytest<-mydatafull[144:169,]
Visitss = ts(mydata$Visits.TT, frequency=52)
Plotting of ACF and PACF graph of the time series
withoutblank<-(complete.cases(Visitss))
ns <-ndiffs(withoutblank, max.d=2)
tsdisplay(Visitss)
ADF test for the stationarity of the time series
adf.test(Visitss)
Regressor Standardization of the budget and other predictor variables to make them unit free
reg_data<-scale(mydata)
reg_data[is.na(reg_data)]<-0
reg_data<-as.data.frame(reg_data)reg_test<-scale(mytest)
reg_test[is.na(reg_test)]<-0
reg_test<-as.data.frame(reg_test)
Calling the ARIMAX function with desired variables according to significant impact on the time series
fit<-auto.arima(Visitss,xreg=reg_data[reg_list])
predictc<-forecast(fit,xreg=reg_test[reg_list])
predicted<-as.data.frame(predictc)
MAPE calculation and finding the best fitted model
mapecal<-mape(mytest$`Visits.TT`[1:7],predicted$`PointForecast`[1:7])mapecal
ACF and PACF graphs creation of the residuals
arima_Con_residule <- residuals(fit)
tsdisplay(arima_Con_residule)
Ljung-Box test for the normality of the residuals
Box.test(arima_Con_residule, lag = 16,fitdf= 1 ,type =”Ljung”)
Forecast Result
Accuracy Calculation
The metric used for measuring accuracy is mean absolute percentage error (mape). Accuracy would be (100-mape). Below is the mape function:
mape<-function(actual,predicted){
return(mean(abs((actual-predicted)/actual)*100)) }
Complete Code
Other Forecasting Techniques
Some other forecasting techniques which can be alternatively used are XGBoost and RNNs.
RNNs : It requires large amount of data and time for optimum learning and the results are not fully inferential in terms of confidence intervals.
XGBoost : Tends to give results which are smoothened thereby reducing error but it doesn’t capture peaks and troughs convincingly.
References
https://www.r-bloggers.com/forecasting-arimax-model-exercises-part-5/