Time Series: ARIMA Model

Eugine Kang
3 min readAug 26, 2017

--

ARIMA stands for AutoRegressive Integrated Moving Average.

AR (Autoregression): A model that uses the dependent relationship between an observation and some number of lagged observations. p is a parameter of how many lagged observations to be taken in.

I (Integrated): A model that uses the differencing of raw observations (e.g. subtracting an observation from the previous time step). Differencing in statistics is a transformation applied to time-series data in order to make it stationary. This allows the properties do not depend on the time of observation, eliminating trend and seasonality and stabilizing the mean of the time series.

MA (Moving Average): A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations. q is a parameter of how many lagged observations to be taken in. Contrary to the AR model, the finite MA model is always stationary.

Parameters of the ARIMA model

p (lag order): number of lag observations included in the model

d (degree of differencing): number of times that the raw observations are differenced

q (order of moving average): size of the moving average window

ARIMA models with python

dataset

Let’s find out the correlation between observation and lag terms with autocorrelation plot.

Positive correlation above 0.50 for the first 5 lags. AR parameters p (lag order) 5 might be a good starting point.

Residual Plot

Residual does not seem to be stationary, meaning there seems to be an overall increase as time goes by. Prediction performance will depend on the time of observation

Residual KDE Plot

Residual seems to be Gaussian but slightly skewed to the left.

Residual’s mean is non-zero suggesting that there is bias with the model prediction.

Rolling Forecast

Predict the next outcome by building a model until the last observation, and repeat as new observations come in.

Could use more work by further tuning the p, d, and q parameters.

Configuring an ARIMA model

Classical approach for fitting an ARIMA model is to follow the Box-Jenkins Methodology.

  1. Model Identification: Use plots and summary statistics to identify trends, and seasonality to get an idea the amount of differencing (d: degree of differencing) and the size of the lag (p: lag order)
  2. Model Estimation: Estimate coefficients of the regression model. Maximum Likelihood
  3. Model Diagnostics: Use plots and statistical tests of the residual errors to determine the amount and type of temporal structure not captured by the model

--

--