AutoML for Time Series forecasting using AutoTS with example

Time Series forecasting made easy

Mehul Gupta
Data Science in your pocket
5 min readAug 25, 2022

--

Time Series forecasting is complex!! Also, as real-world data is mostly time series, you might have to deal with it a lot in your job life. Recently I discovered the concept of AutoML and it has made my life a lot easier.

My debut book “LangChain in your Pocket” is out now

The real question is

Do we have AutoML for Time Series?

Though we don’t have many options as we have for classification problems, we do have a pretty good library called AutoTS which is more than powerful for forecasting time series providing a lot of options for customization as well. So let’s get started as we perform time series forecasting on some synthetic data using AutoTS

First of all, let’s generate a synthetic time series & introduce a weekly seasonal element.

import numpy as np
import pandas as pd
np.random.seed(42)
N = 100
rng = pd.date_range('2019-01-01', freq='D', periods=N)
df = pd.DataFrame(np.random.rand(N, 1), columns=['value'], index=rng)
df['value'][::7] = 10

The above code snippet creates a time series with 100 samples starting from

1st Jan’19 filling values by randomly picking 100 samples & introduces an artificial weekly seasonality element.

Next, we will pip install AutoTS & create an object of AutoTS

pip install AutoTS

Now, let’s create an AutoTS object & get a hang around the hyperparameters used

from autots import AutoTSmodel = AutoTS(
forecast_length=15,
frequency='D',
prediction_interval=0.95,
ensemble=None,
models_mode='deep',
model_list = 'univariate',# or ['ARIMA','ETS']
max_generations=10,
num_validations=3,
no_negatives=True,
n_jobs='auto')

That’s a lot of parameters to have !!

Though we have already ignored many

  • forecast_length = How much in the future do you wish to forecast? The longer the duration, the results may be erroneous for the later timestamps
  • Frequency = The frequency of the training data (day, month, year, second, etc.,)
  • Prediction interval = Similar to the confidence intervals concept we use in stats. 0.95 represents a 95% confidence interval
  • ensemble = Ensembling refers to merging two or models, trained over some common data, to get to the final result. Ensembling can be done in multiple ways. Different values this hyperparameter can take are None, default, simple, etc.,
  • model_list= The most important hyperparameter, model_list provides us with a wide range of options to select a pool of models to train & find the best option out of all others so training can be faster. We can know the different modeling options provided using the below code snippet
from autots.models.model_list import model_lists
print(model_lists.keys())

The output

['all', 'default', 'fast', 'superfast', 'parallel', 'fast_parallel', 'probabilistic', 'multivariate', 'univariate', 'no_params', 'recombination_approved', 'no_shared', 'no_shared_fast', 'experimental', 'slow', 'gpu', 'regressor', 'colin']

Now as you can see, it has some broad categories which, some of which are self-explanatory like

Univariate: No other features except the time series historic data for training

Multivariate: Multiple features

Regressor: Regression models

Now, we can further deep dive to know the specific model names under these broad categories

Let’s see the models under the Univariate section

print(model_lists['univariate'])['ZeroesNaive',
'ETS',
'UnobservedComponents',
'Greykite',
'GLM',
'DatepartRegression',
'NeuralProphet',
'SeasonalNaive',
'LastValueNaive',
'ARDL',
'AverageValueNaive',
'ARIMA',
'GLS',
'UnivariateMotif',
'Theta',
'UnivariateRegression',
'FBProphet']

To be honest, I did try reading about these models but there is just no resource for most of the models.

This is how difficult is to understand Time Series ! you don’t even have resources to read

You can pass a parameter to model_lists in 2 ways

The categorical name (like univariate, multivariate). In this case, all models falling in that category will be tried and tested

List of model names= So if you don’t wish to run a few models in a particular category or try a hybrid of 2 categories, you can go with a custom list of models.

Back to AutoTS hyperparameters

  • max_generations: number of models to be tested per algorithm mentioned in model_list. The bigger this number, the better the results but latency is also high as you would be training more models
  • num_validations: Cross validations to perform
  • no_negatives: True if you are expecting no negative values in your forecast
  • n_jobs: Number of CPU cores available.

Once we are set with all the configs, the next is to train the model over our dataset & forecast future values. Notice that the forecast_length can be changed here as well i.e. you can forecast for any number of days you wish to

model.fit(df['value'])
prediction = model.predict(forecast_length=30)

Also, the model object looks something like this after fitting

Initiated AutoTS object with best model: 
UnobservedComponents
{'fillna': 'mean', 'transformations': {'0': 'SeasonalDifference', '1': 'RobustScaler'}, 'transformation_params': {'0': {'lag_1': 7, 'method': 'Mean'}, '1': {}}}
{'level': 'deterministic constant', 'maxiter': 50, 'cov_type': 'opg', 'method': 'lbfgs', 'autoregressive': None, 'regression_type': None}
SMAPE: 41.257722928849894, 56.70603313065603, 62.48281078842214, 44.31499811205176
MAE: 0.18885943156919346, 0.2211075282132544, 0.25974351337189994, 0.23844137549079086
SPL: 0.015394577575686953, 0.019096814320936354, 0.016756496389216037, 0.018219146038529945

A few crucial things one can observe are

  • The best model was chosen with its hyperparameters
  • Transformations required before training the model
  • Different metrics on validation

The prediction object has 5 attributes

  • Actual forecast
  • Upper & Lower bounds for the forecast
  • Transformation & model parameters

We have trained our model. Now what? how can I save it for future use?

Now here comes the real catch

You can’t save your model

Actually, the author of the library believes that Time Series models should be trained regularly on recent datasets to avoid problems due to data drift which to some extent I believe. So, you can save the configs of the best model but once you wish to forecast, you need to fit the model again only on the chosen model with given configs (hence max_generations can be = 0 where the model will run just once with the saved configs). So, how to save the configs of the best model & reload it

Once you are done with model training

model.export_template(
"model.csv",
models="best",
max_per_model_class=1,
include_results=True,
)
  • The parameters passed are the name of the model saving just the “best” model of all trained models to export_template()
  • include_results = True to save validation & training metrics within the CSV

To load the template & reuse it

#declare the AutoTS object "model" firstmodel = model.import_template(
"model.csv",
method="only",
enforce_model_list=True,
)
model.fit(data)
prediction = model.predict(forecast_length=15)
  • import_template() takes the model name
  • enforce_model_list = True ensures only models in the template (the CSV file) are considered
  • method=’only’ considers just the template config & no other configs while re-fitting
  • Observe we are re-fitting before making the final forecast

Before ending, let’s see the forecast done by AutoTS for the dummy data we considered

%matplotlib inlineimport matplotlib.pyplot as plt
fig,ax = plt.subplots(figsize=(20,3))
ax.plot(df['value'])
ax.plot(prediction.forecast)

Where the blue line represents the actual data while the yellow line represents the forecasted values.

With this, it's a wrap for the day !!

--

--