Attention-Based Autoregression for Accurate and Efficient Multivariate Time Series Forecasting

SNU AI
SNU AIIS Blog
Published in
5 min readMar 26, 2022

By Seyeon An

Photo by Markus Winkler on Unsplash

Ranging from the fluctuating stock price to the complex traffic situation, most of the data we see daily are represented as time series. Such time series data have temporal patterns, and these patterns can be both clear and ambiguous. Time series forecasting is thus a common problem which is actively studied in machine learning and data mining.

Most time series data are multivariate. This refers to time series variables that have correlations to each other. We utilize the relationships between these variables to perform the jobs above, as predicting the traffic of a road, predicting the electricity consumption of a city, and predicting the price of a stock. In other words, we can hugely improve the accuracy of time series forecasting using the relationship in between different regions, different cities, and different stocks.

Thus, the core problem of the paper is to improve the multivariate time series forecasting model, which handles the problem below:

Given multivariate time series X — with the number of variables d and the number of recent observations w — and prediction horizon h, predict the observation y after h time steps.

Multivariate Time Series Forecasting

There currently exist a number of models for multivariate time series forecasting. Yet, existing models require too many parameters, since it considers the patterns in each variable and the relationships between different variables in a single step. Since there are many variables to predict with insufficient training data in the case of time series data, overfitting is very common, which lowers prediction model efficiency and makes hyperparameter tuning very hard. The trend of time series often changes over time, which forces complex models to fail at future predictions.

There have been a few past attempts for building such model. For example, the LSTNet (Lai et al., SIGIR 2018) model showed great accuracy using the RNN (recurrent neural networks) structure, the convolution kernels, and the temporal attention on time scale, but showed low efficiency because it required many parameters. Thus, the goal of this work is to build an accurate and efficient forecasting model, which improves both the accuracy and the efficiency of the existing forecasting models.

Attention-Based Autoregression (AttnAR)

This paper proposes an AttnAR (Attention-Based Autoregression) model which shows great performance using simple attention operations. The structure of the AttnAR model is illustrated in the figure below, and the main ideas of the model can be illustrated as below:

  • Separable Modules : Let’s say we want to predict the stock price of BTC, ETH, an XRP — using the multivariate time series forecasting model. We don’t want our model to lose its efficiency and speed due to its heavy volume, when being punctual is more important than anything else. That’s why we’ve separate these modules, to make it easy to tune its capacity based on the property of each dataset, with separable modules that aim at a sepcific objective.
  • The extractor module captures univariate patterns, and transforms them to pattern vectors.
  • The attention module correlates multiple variables.
  • The predictor module simply produces the final prediction given the pattern vectors.
Separable Modules in AttnAR
Separable Modules in AttnAR
  • Mixed Convolution Extractor : The mixed convolution extractor model captures
  • Shallow dense layers connect distance time steps and shows low degree of abstraction, while deep convolution layers focus on adjacent time steps and shows high degree of abstraction. Amongst the two of them, which extractor model should be used for fast and accurate prediction?
  • The AttnAR model suggests the MCE (Mixed Convolution Extractor) model to capture complex variable-wise patterns, which consists of both deep convolutions and shallow fully-connected layers.
Mixed Convolution Extractor in AttnAR
  • Time-invariant Attention : Attention is a known technique to compute a weighted average of values given in a query.
  • The AttnAR model uses TIA (time-invariant attention) as its attention module. This uses variable-wise embedding vector instead of unstable pattern vector in its attention process, hence minimizing the time series data’s impact on noise and helping to perform stable attention.
Time-invariant Attention in AttnAR
Time-invariant Attention in AttnAR

Experiments

The two main experiment results is as below:

  1. The suggested AttnAR model has much less parameters than competing models and at the same time shows much better performance. The figure below demonstrates that the AttnAR model shows much higher performance than the existing models — as AR (autoregression), VAR (vector autoregression), TRMF (temporal regularized matrix factorization), LSTM (long-short term memory units), LSTNet — in multivariate time series forecasting. One notable fact is that in some data, the number of parameters and the relative squared errors have a positive correlation. This is because unlike many other machine learning domains, there is insufficient training data and the overfitting problem occurs often in the multivariate time series forecasting problem.
Performance Comparison of AttnAR Model and Other Models
Performance Comparison of AttnAR Model and Other Models

2. The figure below visualizes the attention map that the AttnAR model has learned. The picture below shows a clear characteristic in between the variables and the dataset. (a) Traffic and (b) Electricity data shows low correlation in between the variables, © Solar-Energy data shows high correlation, and (d) Exchange-Rate data shows none. This shows that the AttnAR model learns the relationships in between variables using data characteristics. Moreover, attention modules form attention maps based on the embedding vector of each variable, and thus the attention map that the AttnAR model possesses an additional advantage — that we can use it no matter when the prediction period is.

Visualization of Attention Map of AttnAR
Visualization of Attention Map of AttnAR

Conclusion

The proposed model for multivariate forecasting, AttnAR (Attention-Based Autoregression), provides end-to-end learning via two separate modules, using MCE for efficient extraction of univariate patterns and TIA for consistent and robust attention maps. Experimental results demonstrate that AttnAR consistently outperforms existing models. This work would be demonstrated in SDM (SIAM International Conference on Data Mining) 2021.

Acknowledgements

We thank Jaemin Yoo and the co-author of the paper “Attention-Based Autoregression for Accurate and Efficient Multivariate Time Series Forecasting” for their contributions and discussions in preparing this blog. The views and opinions expressed in this blog are solely of the authors.

This post is based on the following paper:

  • Attention-Based Autoregression for Accurate and Efficient Multivariate Time Series Forecasting, Jaemin Yoo, U Kang, SIAM International Conference on Data Mining (SDM) 2021, github

Originally posted on our Notion blog, at April 30, 2021.

--

--

SNU AI
SNU AIIS Blog

AIIS is an intercollegiate institution of Seoul National University, committed to integrate and support AI related research at Seoul National University.