AutoML for Time Series forecasting using AutoTS with example
Time Series forecasting made easy
Time Series forecasting is complex!! Also, as real-world data is mostly time series, you might have to deal with it a lot in your job life. Recently I discovered the concept of AutoML and it has made my life a lot easier.
My debut book “LangChain in your Pocket” is out now
The real question is
Do we have AutoML for Time Series?
Though we don’t have many options as we have for classification problems, we do have a pretty good library called AutoTS which is more than powerful for forecasting time series providing a lot of options for customization as well. So let’s get started as we perform time series forecasting on some synthetic data using AutoTS
First of all, let’s generate a synthetic time series & introduce a weekly seasonal element.
import numpy as np
import pandas as pd
np.random.seed(42)
N = 100rng = pd.date_range('2019-01-01', freq='D', periods=N)
df = pd.DataFrame(np.random.rand(N, 1), columns=['value'], index=rng)
df['value'][::7] = 10
The above code snippet creates a time series with 100 samples starting from
1st Jan’19 filling values by randomly picking 100 samples & introduces an artificial weekly seasonality element.
Next, we will pip install AutoTS & create an object of AutoTS
pip install AutoTS
Now, let’s create an AutoTS object & get a hang around the hyperparameters used
from autots import AutoTSmodel = AutoTS(
forecast_length=15,
frequency='D',
prediction_interval=0.95,
ensemble=None,
models_mode='deep',
model_list = 'univariate',# or ['ARIMA','ETS']
max_generations=10,
num_validations=3,
no_negatives=True,
n_jobs='auto')
That’s a lot of parameters to have !!
Though we have already ignored many
- forecast_length = How much in the future do you wish to forecast? The longer the duration, the results may be erroneous for the later timestamps
- Frequency = The frequency of the training data (day, month, year, second, etc.,)
- Prediction interval = Similar to the confidence intervals concept we use in stats. 0.95 represents a 95% confidence interval
- ensemble = Ensembling refers to merging two or models, trained over some common data, to get to the final result. Ensembling can be done in multiple ways. Different values this hyperparameter can take are None, default, simple, etc.,
- model_list= The most important hyperparameter, model_list provides us with a wide range of options to select a pool of models to train & find the best option out of all others so training can be faster. We can know the different modeling options provided using the below code snippet
from autots.models.model_list import model_lists
print(model_lists.keys())
The output
['all', 'default', 'fast', 'superfast', 'parallel', 'fast_parallel', 'probabilistic', 'multivariate', 'univariate', 'no_params', 'recombination_approved', 'no_shared', 'no_shared_fast', 'experimental', 'slow', 'gpu', 'regressor', 'colin']
Now as you can see, it has some broad categories which, some of which are self-explanatory like
Univariate: No other features except the time series historic data for training
Multivariate: Multiple features
Regressor: Regression models
Now, we can further deep dive to know the specific model names under these broad categories
Let’s see the models under the Univariate section
print(model_lists['univariate'])['ZeroesNaive',
'ETS',
'UnobservedComponents',
'Greykite',
'GLM',
'DatepartRegression',
'NeuralProphet',
'SeasonalNaive',
'LastValueNaive',
'ARDL',
'AverageValueNaive',
'ARIMA',
'GLS',
'UnivariateMotif',
'Theta',
'UnivariateRegression',
'FBProphet']
To be honest, I did try reading about these models but there is just no resource for most of the models.
This is how difficult is to understand Time Series ! you don’t even have resources to read
You can pass a parameter to model_lists in 2 ways
The categorical name (like univariate, multivariate). In this case, all models falling in that category will be tried and tested
List of model names= So if you don’t wish to run a few models in a particular category or try a hybrid of 2 categories, you can go with a custom list of models.
Back to AutoTS hyperparameters
- max_generations: number of models to be tested per algorithm mentioned in model_list. The bigger this number, the better the results but latency is also high as you would be training more models
- num_validations: Cross validations to perform
- no_negatives: True if you are expecting no negative values in your forecast
- n_jobs: Number of CPU cores available.
Once we are set with all the configs, the next is to train the model over our dataset & forecast future values. Notice that the forecast_length can be changed here as well i.e. you can forecast for any number of days you wish to
model.fit(df['value'])
prediction = model.predict(forecast_length=30)
Also, the model object looks something like this after fitting
Initiated AutoTS object with best model:
UnobservedComponents
{'fillna': 'mean', 'transformations': {'0': 'SeasonalDifference', '1': 'RobustScaler'}, 'transformation_params': {'0': {'lag_1': 7, 'method': 'Mean'}, '1': {}}}
{'level': 'deterministic constant', 'maxiter': 50, 'cov_type': 'opg', 'method': 'lbfgs', 'autoregressive': None, 'regression_type': None}
SMAPE: 41.257722928849894, 56.70603313065603, 62.48281078842214, 44.31499811205176
MAE: 0.18885943156919346, 0.2211075282132544, 0.25974351337189994, 0.23844137549079086
SPL: 0.015394577575686953, 0.019096814320936354, 0.016756496389216037, 0.018219146038529945
A few crucial things one can observe are
- The best model was chosen with its hyperparameters
- Transformations required before training the model
- Different metrics on validation
The prediction object has 5 attributes
- Actual forecast
- Upper & Lower bounds for the forecast
- Transformation & model parameters
We have trained our model. Now what? how can I save it for future use?
Now here comes the real catch
You can’t save your model
Actually, the author of the library believes that Time Series models should be trained regularly on recent datasets to avoid problems due to data drift which to some extent I believe. So, you can save the configs of the best model but once you wish to forecast, you need to fit the model again only on the chosen model with given configs (hence max_generations can be = 0 where the model will run just once with the saved configs). So, how to save the configs of the best model & reload it
Once you are done with model training
model.export_template(
"model.csv",
models="best",
max_per_model_class=1,
include_results=True,
)
- The parameters passed are the name of the model saving just the “best” model of all trained models to export_template()
- include_results = True to save validation & training metrics within the CSV
To load the template & reuse it
#declare the AutoTS object "model" firstmodel = model.import_template(
"model.csv",
method="only",
enforce_model_list=True,
)model.fit(data)
prediction = model.predict(forecast_length=15)
- import_template() takes the model name
- enforce_model_list = True ensures only models in the template (the CSV file) are considered
- method=’only’ considers just the template config & no other configs while re-fitting
- Observe we are re-fitting before making the final forecast
Before ending, let’s see the forecast done by AutoTS for the dummy data we considered
%matplotlib inlineimport matplotlib.pyplot as plt
fig,ax = plt.subplots(figsize=(20,3))
ax.plot(df['value'])
ax.plot(prediction.forecast)
Where the blue line represents the actual data while the yellow line represents the forecasted values.