Tuning ARIMA for Forecasting: An Easy Approach in Python

Sandeep Singh Sandha
4 min readDec 31, 2022

--

This post will present a straightforward approach that estimates parameters that closely approximate the state-of-the-art manual approach. Github_link
We will use the bayesian optimization approach (Mango) to search for the best parameters out of 108,000 possible options in just 200 iterations.

The ARIMA time-series forecasting model works very well for series having trends and seasonality. It is a widely-adopted classical model that often serves as a baseline to benchmark modern deep-learning approaches. But it is challenging to estimate its accurate parameters. Often researchers and developers use trial-and-error methods that include visual plotting.

What is the ARIMA model?

ARIMA model, short for ‘auto regressive moving average’ is a class of models using past values to estimate future predictions. ARIMA model is defined by the three parameters: p, d, and q.

ARIMA model has different variants studied in the literature. In this post, we will use the implementation from the statsmodels library.

The entire notebook shows a simple implementation is available here. You can modify this implementation for your dataset. Create separate train-test splits as desired. I have kept it simple to outline the significant tuning steps.

Complete Code: Using Mango to Automate Tuning

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv')

from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from mango import scheduler, Tuner

def arima_objective_function(args_list):
global data_values

params_evaluated = []
results = []

for params in args_list:
try:
p,d,q = params['p'],params['d'], params['q']
trend = params['trend']

model = ARIMA(data_values, order=(p,d,q), trend = trend)
predictions = model.fit()
mse = mean_squared_error(data_values, predictions.fittedvalues)
params_evaluated.append(params)
results.append(mse)
except:
#print(f"Exception raised for {params}")
#pass
params_evaluated.append(params)
results.append(1e5)

#print(params_evaluated, mse)
return params_evaluated, results

param_space = dict(p= range(0, 30),
d= range(0, 30),
q =range(0, 30),
trend = ['n', 'c', 't', 'ct']
)

conf_Dict = dict()
conf_Dict['num_iteration'] = 200
data_values = list(df['#Passengers'])
tuner = Tuner(param_space, arima_objective_function, conf_Dict)
results = tuner.minimize()
print('best parameters:', results['best_params'])
print('best loss:', results['best_objective'])
best parameters: {'d': 0, 'p': 17, 'q': 23, 'trend': 'ct'}
best loss: 112.06886739549542

Tuning Steps

Dataset: We will use a simple dataset of airpassangers that records the number of airline passengers.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv')
df.head()

Plot the series to see the trend and seasonality

from matplotlib import pyplot as plt
f = plt.figure()
f.set_figwidth(15)
f.set_figheight(6)
plt.plot(df['#Passengers'], linewidth = 4, label = "original Series")
plt.legend(fontsize=25)
plt.xlabel('Months', fontsize = 25)
plt.ylabel('Count', fontsize = 25)
plt.show()

This data set has an upward trend and a seasonality of 12 months.

Traditionally an approach can be to remove the trend and seasonality from the original series using domain knowledge and then use the residual series for prediction into the future. However, we will look at a more straightforward automated approach.

How to tune parameters automatically?

We will use a state-of-the-art optimization library called Mango to find the best parameters for our dataset. Let us first define the range of the parameters. In this optimization approach, we define the possible range of parameters. This range can be very large and doesn’t need to be precise. These parameters are defined from the statsmodels library.

param_space = dict(p= range(0, 30),
d= range(0, 30),
q =range(0, 30),
trend = ['n', 'c', 't', 'ct']
)

The parameter space is defined using python constructs: range and list. The set of total possible combinations of parameters is 30*30*30*4 = 108,000. Hence, an exhaustive grid search is extremely time-consuming. We will use a bayesian search optimizer approach to automate the search within ~100 iterations. Note: Depending on your dataset, the size of the range and its search space may differ. Defining a large search space is fine; let the optimizer do the hard job for you.

Train the ARIMA model

To use Mango, we have to define an objective function that returns the ARIMA model error for a given set of parameters.

from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from mango import scheduler, Tuner

def arima_objective_function(args_list):
global data_values

params_evaluated = []
results = []

for params in args_list:
try:
p,d,q = params['p'],params['d'], params['q']
trend = params['trend']

model = ARIMA(data_values, order=(p,d,q), trend = trend)
predictions = model.fit()

mse = mean_squared_error(data_values, predictions.fittedvalues)
params_evaluated.append(params)
results.append(mse)
except:
#print(f"Exception raised for {params}")
#pass
params_evaluated.append(params)
results.append(1e5)

#print(params_evaluated, mse)
return params_evaluated, results

We get the parameters from the Mango library and return back the parameters and their results. The results include the error of the trained ARIMA model. In this case, the error is mean_squared_error. We also include the try-catch statements because the ARIMA model may not converge for every combination/choice of parameters. We only return the set of parameters for which the model works. Mango internally uses these parameters optimally to find the best model within very few iterations (100 in this example). Our goal is to find parameters that minimize the error function.

Control iterations of Mango: Config parameter.

from mango import scheduler, Tuner

conf_Dict = dict()
conf_Dict['num_iteration'] = 200

tuner = Tuner(param_space, arima_objective_function, conf_Dict)

Visualize the Best Model Predictions

Overall we see the set of total possible combinations of parameters is very large (108,000).

def plot_arima(data_values, order = (1,1,1), trend = 'c'):
print('final model:', order, trend)
model = ARIMA(data_values, order=order, trend = trend)
results = model.fit()

error = mean_squared_error(data_values, results.fittedvalues)
print('MSE error is:', error)

from matplotlib import pyplot as plt
f = plt.figure()
f.set_figwidth(15)
f.set_figheight(6)
plt.plot(data_values, label = "original Series", linewidth = 4)
plt.plot(results.fittedvalues, color='red', label = "Predictions", linestyle='dashed', linewidth = 3)
plt.legend(fontsize = 25)
plt.xlabel('Months', fontsize = 25)
plt.ylabel('Count', fontsize = 25)
plt.show()

print(results['best_params'])

order = (results['best_params']['p'], results['best_params']['d'], results['best_params']['q'])
plot_arima(data_values, order=order, trend = results['best_params']['trend'])

As seen, the predictions match perfectly with the ground truth. Interested to learn more about Mango checkout, its GitHub repo that includes a diverse set of examples.

--

--

How I Deployed a Machine Learning Model for the First Time

13 min read

Jul 18

19 Most Elegant Sklearn Tricks I Found After 3 Years of Use

12 min read

Jun 15

Data Science Project: Sales Forecasting with Arima

4 min read

Feb 5, 2022

6 Powerful Feature Engineering Techniques For Time Series Data (using Python)

12 min read

Dec 13, 2019

Time Series in Python —  Part 2: Dealing with seasonal data

5 min read

Feb 14, 2019

Time Series in Python — Exponential Smoothing and ARIMA processes

12 min read

Feb 7, 2019

Explain Any Machine Learning Model in Python, SHAP

9 min read

Sep 22, 2022

11 Times Faster Hyperparameter Tuning with HalvingGridSearch

9 min read

Apr 9, 2021

Automatic Hyperparameter Tuning with Sklearn GridSearchCV and RandomizedSearchCV

6 min read

Mar 5, 2021

5 things you are doing wrong in PyCaret

6 min read

Nov 2, 2020

Sandeep Singh Sandha

PhD ; Senior Machine Learning Engineer