Model Less Time Series Forecasting (MLTF): A new non-parametric approach for forecasting

Published in

AI FUSION LABS

9 min readOct 27, 2022

Time-series forecasting has been an active area of interest for practitioners across different domains much before the advent of machine learning and AI, and it will remain so for its ever-relevant applications in every aspect of our lives. From sales budgeting to inventory management and weather forecasting to stock prediction, forecasting techniques are avidly used everywhere.

Plenty of literature is available in the field of time-series forecasting, however, it is difficult to try and cover it all in a single blog. Instead, we will categorize and summarize the literature in an objective manner, based on their parametric (if they have a functional form and the parameters need to be estimated) vs. non-parametric nature (they don’t require a functional form therefore no need for parameter estimation). We then talk about the advantages and disadvantages of both. Finally, we will discuss our recent work on Model Less Time-series Forecasting (MLTF) [1] and discuss how we are extending the non-parametric forecasting paradigm.

Parametric Approaches for Time-Series Forecasting

As the name suggests, with these approaches we parametrize the time series problem in a functional form with its historical observation, exogenous variables, etc. These include both classical statistical methods as well as the more complex, and recent, neural-network-based deep learning approaches.

Statistical time-series forecasting algorithms: have been used by practitioners for decades. These methods can be further categorized into the following families:

Exponential Smoothing Model (ESM) family: The main philosophy here is to fit the predictions of future values in the form of weighted averages of historical observations.
Auto-Regressive Integrated Moving Average (ARIMA) family: The ARIMA family represents the time series attributes in terms of certain parameters by employing the spectral decomposition technique. A generic ARIMA model is a combination of an autoregressive (AR) term, a moving-average term (MA), and for a time series that requires differencing (i.e., non-stationary time series), an integrated (I) element. ARIMA can handle both seasonal and non-seasonal time series. Besides the general ARIMA, there are other variants such as Vector ARMA and Non-linear ARIMA that serve different purposes such as multi-variate forecasting or non-linearity respectively.
Unobserved Component Model (UCM) family: The UCM (also known as Structural Models) breaks the target univariate series into different components using the Kalman-filtering algorithm (unlike ARIMA) in a convenient additive manner to provide forecasting.

Deep Learning-based Forecasting Algorithms: With the advances in Deep Learning methods in recent years, Deep neural networks (DNNs) especially recurrent neural networks (RNNs) have become increasingly popular in the time-series domain. Various forms of Long Short-Term Memory (LSTMs), Gated Recurrent Units (GRUs), and Attention networks (i.e., Transformers) are being increasingly used for multi-horizon time series forecasting. DL models in general often fall short when it comes to interpretability because of their black-box nature. Therefore, researchers have also combined the power of data-driven learning from deep networks and statistical techniques such as fuzzy logic (Fuzzy Neural Networks), AR (DeepAR), etc. to provide better handling of uncertainty and explainability in forecasting.

In general, statistical methods are often faster than deep learning models and unlike deep learning methods, they are less data-hungry and easily interpretable. Therefore, in many industries, statistical methods are still the preferred choice when it comes to short-term forecasting. However, statistical methods often fall short in capturing long-term dependencies [2] whereas Deep RNNs have been shown to do much better in that aspect [3]. Therefore, depending on the use case where long-term dependency is required to be captured, DL methods are gaining more prominence.

Next, we shall discuss the drawbacks of the parametric forecasting methods as a whole and why we should consider a non-parametric approach instead.

Shortcomings of Parametric time-series forecasting

Major problems often associated with parametric methods are:

1. Error Accumulation: While performing long multi-horizon forecasts (e.g. where the forecast window is long as compared to short-term forecasts where we only are looking to forecast a few steps ahead), the parametric methods often rely on near forecasts to predict for ‘far’ forecasts. For example:

y’(t+k) = f (y’(t+k-1), c) + error = f (f(y’(t+k-2), c’ + error) + c) + error = …….

Therefore, to predict something ‘k’ step ahead, the model keeps re-using the earlier predictions. Now at each step of the prediction, the model makes an error, which gets accumulated over the forecast horizon resulting in an increasingly poor forecast as illustrated in the following figure,

Error accumulation in long-horizon forecast

We observe that the forecasting model (ARMA here) is not able to perform well in the long-horizon forecasting task.

2. Pre- and Post-Processing: Parametric models of time-series forecasting require significant pre- and post-processing. For example, an ARMA model needs the time series to be different in the case of a non-stationary series. For most deep learning models, we will require the time series to be standardized, missing values to be imputed, etc.

3. Model Training: Deep networks (especially RNNs) are notorious for their huge training times (as they cannot be easily parallelized) which increases the computational burden being put on the system many folds. Even with statistical methods, we are required to approximate the parameters in use. Therefore, training and convergence remain pertinent challenges for such methods.

4. Business Impact: Training a deep learning model can be a costly affair because of the extended training time and computational power required. Additionally, DL methods are often black box in nature leaving little scope for interpretability. On the other hand, statistical methods are often not up to the mark when it comes to capturing long-term dependency and therefore perform poorly in the multi-horizon forecast.

Therefore, researchers are also evaluating approaches that do not require parametrization. Going forward we are going to refer to them as non-parametric approaches.

Non-Parametric Approaches for Time-Series Forecasting

With the non-parametric approaches, we look for similarity within a series (i.e., self-similarity: if a pattern has occurred historically then it might repeat itself in the future) or between series (i.e., cross-similarity i.e., recurring similar patterns in other series). These approaches are purely data-driven and similarity-based therefore do not require parametrization.

1. Most of the prior research in this area focuses on self-similarity-based approaches. These algorithms are built on the core idea as explained in [4], “If a pattern x_a in a period proceeding to the forecast moment, is similar to a pattern x_b from the history of this series, then the forecast pattern y_a would be similar to forecast pattern y_b”. Other prominent works in this domain include [5–6]. Besides, tree-based models are also being adopted increasingly for forecasting tasks. They can be broadly categorized under the non-parametric self-similarity groups as well.

2. In the ‘Deja vu’ paper [7], researchers proposed a cross-similarity-based forecasting approach. They demonstrated the efficacy of the proposed mechanism in terms of accuracy and runtime. They also showed that similarity across series i.e., cross-similarity can be further beneficial than self-similarity.

Limitations of the Existing Non-parametric Approaches

1. Most non-parametric approaches in the literature primarily focus on self-similarity which can be limiting in scope and applications.

2. Cross-similarity-based approach such as Deja-vu also has their limitations. It employs an expensive “Dynamic Time Wrapping Process” which compares the test series (i.e., the series to forecast) to all the series in the repository (containing the pool of univariate series) to find direct similarity, making it computationally expensive and slower.

3. Most of the non-parametric algorithms also require significant pre- and post-processing such as smoothing (de-trending), de-seasonalization, etc. which often require high expertise.

Recently, we have proposed a novel non-parametric cross-similarity-based forecasting approach called MLTF [1], to address the limitations of the existing methods. The stand-out features of MLTF are,

MLTF requires low computational effort and provides faster forecasts than its peers.
MLTF does not need parameter estimation (or training) and therefore requires less expert intervention.
MLTF requires minimal pre- or post-processing, making it more accessible to end-users.
MLF does not face the problem of error accumulation, therefore, performs well in long multi-horizon forecasts.
MLTF performs forecasts based on cross-similarity therefore we can always refer to the ‘similar’ series for interpretability.

The following illustration summarizes the time series literature,

Model Less Time-series Forecasting (MLTF)

In MLTF, we learn and forecast the trajectory for a particular target series from a repository (repo) of univariate series. In that light, first, we find the series in the repo which are similar to our target series; We then use trajectories, from the identified series, to forecast for the target series.

The framework can be divided into the following five components:

1. Repository: A set of univariate time series.

2. Time-series Representation: Focusing on extracting statistical time-series features such as trend, entropy, correlation, etc. which are used to find similar series from the repository.

3. Mapping Similar Series: Identify the series in the repository which are like the target series. The k-means clustering algorithm is utilized with the time-series representation for this mapping.

4. Re-sampling: Ensuring that the series with different lengths are re-adjusted so that they all have the same lengths as the target series.

5. Trajectory Projection: Using the ‘similar’ re-sampled series to forecast for the target series.

The diagram in figure 1 summarizes how MLTF works.

Major Findings

MLTF does not suffer from the problem of error accumulation and performs consistently across different frequencies (from the low yearly frequency in M1, and M3 to high hourly frequency in the electricity dataset) and different kinds of series (from stationary to non-stationary, seasonal to non-seasonal, etc.
MLTF is shown to work well even with shorter historical information. A significant portion of the target series had a small number of samples and yet MLTF was able to perform better compared to the data-hungry deep learning models.
MLTF is faster than its peers owing to its purely data-driven model-less nature and minimal-to-no requirement for pre-processing, tuning, or training.

For a detailed numerical analysis and discussion of the results please refer to the main paper [1].

Final Takeaway

Non-parametric forecasting methods in general require very little pre- or post-processing, their run-time is significantly shorter, and their accuracy rate is at par with that of the parametric models. These approaches also benefit from better explainability because the forecast is similarity-based, and we can always fall back to the ‘similar series’ to understand the nature of the forecast. We also discussed how MLTF utilizes cross-similarity to provide forecasts and how it’s advantageous compared to its parametric counterparts.

However, the main takeaway from this article should be that we should not restrict ourselves within the bounds of parametrization, but rather try to look beyond. Recently practitioners are increasingly focusing on ‘data-centric AI’ by focusing more on data and less on the model so that businesses can ensure improved accuracy, increased efficiency, and reduced costs. We hope that non-parametric methods where data is the key would bring us much closer to the dream of a ‘data-centric AI’ and MLTF would be a small step in that direction!

References

Samanta, Subhrajit, PKS Prakash, and Srinivas Chilukuri. “MLTF: Modelless time-series forecasting.” Information Sciences 593 (2022): 364–384.
Salinas, David, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. “DeepAR: Probabilistic forecasting with autoregressive recurrent networks.” International Journal of Forecasting 36, no. 3 (2020): 1181–1191.
Lim, Bryan, Sercan Ö. Arık, Nicolas Loeff, and Tomas Pfister. “Temporal fusion transformers for interpretable multi-horizon time series forecasting.” International Journal of Forecasting 37, no. 4 (2021): 1748–1764.
Dudek, G., 2010. Similarity-based approaches to short-term load forecasting. Forecasting Models: Methods and Applications, pp.161–178.
Nikolopoulos, K.I., Babai, M.Z. and Bozos, K., 2016. Forecasting supply chain sporadic demand with nearest neighbor approaches. International Journal of Production Economics, 177, pp.139–148.
Li, H., Liu, J., Yang, Z., Liu, R.W., Wu, K. and Wan, Y., 2020. Adaptively constrained dynamic time warping for time series classification and clustering. Information Sciences, 534, pp.97–116.
Kang, Y., Spiliotis, E., Petropoulos, F., Athiniotis, N., Li, F., & Assimakopoulos, V. (2021). Déjà vu: A data-centric forecasting approach through time series cross-similarity. Journal of Business Research, 132, 719–731.

Author Bio: Dr. Subhrajit Samanta is a senior AI Research Scientist with ZS Associates. He received his Ph.D. from Nanyang Technological University, Singapore in 2020. His primary expertise includes Time series forecasting, statistics- classical ML, synthetic data generation, and Deep Learning (RNN) techniques.