THE EMERGENCE OF HYBRID MODELS in TIME SERIES FORECASTING

4 min readJul 22, 2019

THE EMERGENCE OF HYBRID MODELS in TIME SERIES FORECASTING

We are not living in times where knowledge can be extracted from scratch. The importance of hybrid models is growing day by day and there is a need of the hour for almost every data scientist to look beyond the majestic results offered by a single statistical/computer science model. Thus, before understanding the importance of hybrid models, it is very important to understand what hybrid models are? Simply stated, a hybrid model is built by combining two or more techniques in order to understand the nature of the data comprehensively. The concept of hybrid models is stressed recently because it gives us efficient models to predict/forecast data. Hence, hybrid models are important to understand linear as well as non-linear nature of the data set and so, they tend to touch specific nuances of the data.

Understanding Linear and Non-Linear Points

Time Series forecasting is a very pivotal application of Machine Learning and plays a crucial role in today’s business environment. The concept of Hybrid models can be very well used and applied in the field of Time Series Forecasting and Simulation. So, let us first understand what Time-Series data is? Time Series is a sequence of data points measured at a regular time-intervals over a period. Irregular data does not form Time-Series. Time Series Analysis helps us to recognize the major components in a time series data. It is very well known that with the help of Time Series models forecast data on the past behavior of data points. There are several forecasting techniques that we know about, be it Simple Exponential Smoothing Models, Hidden Markov Models or even Artificial Neural Networks. Each method plays a pivotal role in forecasting data points. Among all these methods, there are two methods which are widely used, and they are Auto-Regressive Integrated Moving Average and Recurrent Neural Networks. Both are widely used in day to day applications because these methods are very well adjusted to linear or non-linear data.

An illustration depicting time-series data

The ARIMA models, pioneered by Box and Jenkins are the most popular and effective statistical models for time series forecasting. These are based on the fundamental principle that the future values of a time series are generated from a linear function of the past observations and white noise terms The term d represents the degree of ordinary differencing, applied to make the series stationary. The appropriate orders of the ARIMA (p, d, q) model are usually determined through the Box-Jenkins model building methodology. Due to the linearity restriction, an ARIMA may not be that effective for modeling a general real-world time series.

An illustration depicting ARIMA forecasts and it’s comfort with linear points

And if we look at the basic architecture of Artificial Neural Network for time series analysis, then we know that ANNs constitute a very successful alternative to the ARIMA models for time series forecasting and have many distinguishing characteristics. One of them is the universal approximation, i.e. an ANN can estimate any nonlinear continuous function up to any desired degree of accuracy. A single hidden layer feedforward ANN with one output node is most commonly used in forecasting applications.We take the logistic function as the hidden layer activation function g. So far ANNs have been very effective for modeling nonlinearly generated time series but provided mixed results for linear problems.

An illustration depicting Artificial Neural Network and it’s close proximity to Human Brain

So, it is a bit evident that both ARIMA and ANN are alone not enough because we have both linear and non-linear data in daily applications. It is, therefore, a need for hybrid models that can give comprehensive analysis and forecast data points according to the nature of the dataset.

It is known that ARIMA forecasts data points which are linear in nature. So, the residuals must be non-linear and hence, then we pass these residuals which are non-linear in nature to a recurrent neural network so as to forecast values from the non-linear component of the data set. Finally, adding the forecasts from both models will give us the final forecasted values that we are seeking. Both the values when added gives very minimal residual errors which is one of the most important aspects for our analysis and simulation process.

Hybrid models today in the field of Data Science are the right way to move in as they give an encompassed view of the problem that we are looking into. Achieving reasonably accurate forecasts of a time series is a very important yet challenging task. ARIMA and ANN are two widely popular and effective forecasting models. It is almost impossible to establish the exact nature of a series and a real-world time series most often contains both linear as well as nonlinear correlation structures and so this is where Hybrid Model can come into play and act as a conjunction to present the desired result. There can be many hybrid models in nature. The identification of the nature of data points is really important. For example, we could have used a Simple Exponential Smoothing model as well in place of RNN as Simple Exponential Smoothing model deals very well with data which has a lot of noise.

Written by Ayush Yajnik