Forecasting Demand with seasonality.

Gaurang Mehra
7 min readSep 18, 2022

--

Part 2-: Time series Forecasting SARIMA/SARIMAX

Series Scope- How would you forecast a variable into the future given data over the past n periods. This problem arises in business when we are trying to predict demand for a product given past data with some trend or seasonality. We will look at the following methods, with their pitfalls

  • Part 1- Holt Winters and Moving Averages (link)
  • Part2- SARIMA/SARIMAX models. (Focus of this article)

Use case

Forecasting demand (units/visitors)etc., is a first step in any planning process. Typically this is followed by adding in price, revenue and costs to arrive at an EBITDA number. You can also vary price and costs over a range of values and develop a 95% CI for the EBITDA.

In this article we look at the very first step of the process, forecasting the demand for beer. We use a time series model that takes into account past trends and seasonality to forecast into the future. The model uses the previous n observations to forecast the n+1 value into the future. We look at how to use optimization to find the model parameters, fit the model and make forecasts.

Current Article Scope

  • Short Explanation of ARMA/ARIMA and extension to SARIMA/SARIMAX models
  • Understand how beer sales vary over time. Decompose it into trends and seasonality
  • Build a predictive model to predict the beer sales in the future
  • Calculate and visualize the accuracy of the predictions

Key steps

  1. Explain ARIMA/SARIMA
  2. Understand if the data is stationary (does it have trend and seasonality). In this case the dataset is chosen with trend and seasonality
  3. Run pmd_arima to get the optimal values of the model parameters (p,d,q) for ARIMA and the seasonal adjustments (P,D,Q). There are manual methods to do the same, but in practical settings we would use a package like auto_arima which runs a grid search for the optimal parameters
  4. Fit the model on training data. Check and visualize the fit of the model
  5. Forecast into the future.

What are ARIMA/SARIMA models

ARIMA models are models that only use the past values of a time series to make future forecasts

Advantages of ARIMA Models

  • Use only the past values of timeseries data to predict future values
  • Generally make accurate short term forecasts.

Disadvantages of ARIMA models

  • Difficult to predict turning points
  • The risk of large forecasting errors increases the more longer term the forecasts
  • Determining the model parameters is a subjective process
  • Holt Winters and exponential smoothing are easier to explain

ARIMA explained

Arima models are described by 3 components and their associated parameters

  1. Auto Regressive(AR) (p)- This simply means that the value of the data is dependent on its value in the past . The order (p) of this AR component is simply the number of past observations that the model will take into account. We have to specify this at the time of fitting the model.
  2. Moving Average (MA) (q)- This component is the moving average of the past prediction errors. The order (q) here determines how many past prediction errors the model will take into account. We have to specify this at the time of fitting the model
  3. Integrated (I) (d)- This component is used in case the data is non stationary. Data with any trend or seasonality are not stationary. To make the data stationary we have to difference the data (y(t)-y(t-1)). Sometimes we have to do this operation 2 or 3 times to remove trend and seasonality to make the data stationary.

These components of an ARIMA model can be determined visually by using Auto Correlation Plots (ACF) and Partial Auto Correlation Plots (PACF) but in most practical cases we use in built algorithms like auto_arima to find the optimum values of p,d,q for our model. In the case of seasonal data you have seasonal adjustment factors P,D,Q and the model is known as SARIMA

Now on to working with the data…

Understand if the data is stationary

First step is to load the data and plot it to visually inspect for trend and seasonality

Exhibit 1.1 Beer unit sales over time

Data looks like it has an increasing trend over time as well as seasonality as evidenced by the periodic spikes.

We further explore the seasonality below by plotting 2 years of data from 2000 to 2002.

Exhibit 1.2 Beer sales zoomed in (2000 to 2002)

Clearly we see the seasonal trend where sales go up in November and December. This ties into conventional wisdom, we would expect sales of beer to increase in the holiday season.

Code for this section is in the gist below

Exhibit 1.3 Code for Loading and inspecting data

Determine the optimum ARIMA parameters

We need to first figure what the p,d,q parameters of the ARIMA model are. There are manual methods to do this step, but in most practical settings you would use a grid search algorithm to determine the same. Here we use the pmd_arima package’s auto_arima function to do the same.

The results of the optimal parameters are below. In this first instance we used an ARIMA model and set the seasonal parameter in auto_arima to False. We also provided some start and max values to p,q so that the model does not spin into too many iterations and to preserve computing resources

Exhibit 1.4 Auto Arima report

The report above shows the optimal order of the ARIMA model to be 0,1,1. This means that the auto regressive component is 0 or does not rely on past values, the data needs to be differenced 1 time to be made stationary and it depends on the error value of the previous term

Now we run another instance of auto_arima but this time with the seasonal parameter set to true. Since seasonality is annual (every Nov and Dec as shown in exhibit 1.2) the seasonal periods m is set to 12.

Exhibit 1.5 Auto ARIMA seasonal
Exhibit 1.6 Auto Arima report with seasonal

Here the optimal parameters are ARIMA (0,1,1) with seasonal adjustments (0,0,2,12). The 12 at the end represents that the data is organized at the monthly level with seasonality occurring every year.

We will fit the data to both models and assess performance. We would expect that the SARIMA model performs better given the obvious seasonality in the data.

Fit and evaluate the Models

  • Lets first split the data into training and test sets. The train set is used to train the model and the test set is used to test how well the model performs on unseen data. The 1st 300 months of data is used for training while we evaluate model predictions on the unseen test data.
Exhibit 1.7 Train test split code
  • We now fit the ARIMA model to the training data using order (0,1,1) (see exhibit 1.4), make predictions on the test data and compare those predictions against actuals. We know Mean absolute error as the metric to evaluate accuracy of predictions
Exhibit 1.8 Fitting to ARIMA model

The output in this case gives an error of 8.57% on average on the test data. Lets also visualize this. We see below that the model captures the trend but not the seasonal peaks and valleys

Exhibit 1.9 Code for visualizing Actuals vs predictions
Exhibit 2.0 Visualizing Arima Model fit

Now lets see if we do better by fitting the training data to the SARIMA model using the parameters we found earlier (see exhibit 1.6).

Exhibit 2.1 Code for SARIMA Model

In this case the error is 5.59% on average on the test data, substantially lower than the ~9% we saw in the previous case. Lets visualize the predictions vs actuals for both models

Exhibit 2.2 Code for visualizing SARIMA model
Exhibit 2.3 Visualizing SARIMA model

In this case the yellow line of seasonally adjusted SARIMA predictions follows the blue line more closely and visually we can see that the SARIMA model is a better fit

RECAP -: Based on this section the SARIMA model is the best fit and performs best on unseen test data.

Forecast into the future

Now that we have a best fit model will forecast into the future. First we use the optimal model parameters to fit a model on the entire data train and test and then forecast a year into the future

Fig 2.4 Code generating and visualizing forecasts
Exhibit 2.5 Visualizing model fit and forecast

Github link below

https://github.com/gmehra123/data_science_projs/tree/main/Time_Series_arima

--

--

Gaurang Mehra

Deeply interested in Data Science, AI and using these tools to solve business problems.