Time Series Forecasting using ARIMA Model

Nilay Chauhan
Data Stash
Published in
4 min readJun 2, 2021
Photo by Isaac Smith on Unsplash

The ARIMA model is one of the widely used and powerful statistical time-series algorithms which can be used for analyzing and forecasting time series data.

ARIMA stands for Autoregressive Integrated Moving Average. It is a combination of two models which are autoregressive and moving average.

ARIMA Model has three parameters:

  1. p: it is the number of autoregressive lags.
  2. d: it is the order of differencing required to make the series stationary.
  3. q: it is the number of moving average lags.

Types of ARIMA Model

  • ARIMA: It is a non-seasonal Autoregressive Integrated Moving Average model.
  • SARIMA: It is a seasonal ARIMA model.
  • SARIMAX: It is a seasonal ARIMA with exogenous variables.
  • Pyramid Auto-ARIMA: The ‘auto_arima’ function from the pmdarima library helps us to identify the most optimal parameters for an ARIMA model and returns a fitted ARIMA model.

If you want to use the ARIMA model, your time series should be stationary, which means that your time series should have data over different time periods. It should satisfy three conditions:

  1. Time series should have a constant mean.
  2. Time series should have a constant standard deviation.
  3. Time series’s auto-covariance should not depend on time.

Tests to check if a time series is stationary or not:

  • Rolling Statistics: Rolling statistics is a visualization technique, in which you plot the moving average to see if it varies over time.
  • ADCF Test: ADCF stands for Augmented Dickey-Fuller test which is a statistical unit root test. It gives us various values which can help us identifying stationarity. It comprises Test Statistics & some critical values for some confidence levels. If the Test statistics is less than the critical values, we can reject the null hypothesis & say that the series is stationary. The Null hypothesis says that time series is non-stationary. THE ADCF test also gives us a p-value. According to the null hypothesis, lower values of p is better.

Let’s understand everything in detail with the help of an example.

Reading the data

We will be using the dataset which contains the number of passengers of an airline per month from the year 1949 to 1960. You can use this dataset from, here.

df = pd.read_csv('AirPassengers.csv',
index_col='Month',
parse_dates=True)
df=df.dropna()
print('Shape of data',df.shape)
df.head()

Plotting the data

Now, we will visualize the dataset to detect patterns. This gives you an idea of whether the data is stationary or not.

df['#Passengers'].plot(figsize=(12,5));

Split Your Dataset

We need to split our dataset into training and testing dataset.

train=df.iloc[-100:]
test=df.iloc[:-100]

Now, we will perform the Augmented Dickey-Fuller Test:

from statsmodels.tsa.stattools import adfuller
def ad_test(dataset):
dftest = adfuller(dataset, autolag = 'AIC')
print("1. ADF : ",dftest[0])
print("2. P-Value : ", dftest[1])
print("3. Num Of Lags : ", dftest[2])
print("4. Num Of Observations",dftest[3])
print("5. Critical Values :")
for key, val in dftest[4].items():
print("\t",key, ": ", val)
ad_test(train['#Passengers'])

Here, the p-value is 0.99 which is greater than our confidence value which is 0.05, which means we will be accepting the null hypothesis and our data is non-stationary.

Training the model

We will use the auto_arima function from pmdarima, which automatically discovers the best order for an ARIMA model. In simple terms,it will automatically determine the parameters of the ARIMA model.

The function basically uses the AIC score to judge how good a particular order is. It simply tries to minimize the AIC score. We can see the best ARIMA model seems to be of the order (4,1,3) with the minimum AIC score

import pmdarima as pmddef arimamodel(timeseriesarray):
autoarima_model = pmd.auto_arima(timeseriesarray,
start_p=1,
start_q=1,
test="adf",
trace=True)
return autoarima_model
arima_model = arimamodel(train)
arima_model.summary()

Making predictions on test data

test['ARIMA'] = arima_model.predict(len(test))
test.head(5)

Evaluate the model

We will be using mean absolute percentage error as the evaluation metric for this model.

mean_absolute_percentage_error(test['#Passengers'], test.ARIMA)

--

--