Cracking the Secret Behind Time Series Forecasting

A strategic approach to define a Time Series Forecasting model

S Porreca
Machine Learning Reply DACH
6 min readMar 2, 2022

--

Photo by Jason Briscoe on Unsplash

The Revolution of the Modern Seers

When we talk about Time Series, we all think about market traders sitting in front of a Bloomberg terminal watching incredibly complicated plots, struggling to forecast how the market will evolve. This is partly true because Time Series Analysis (and Forecasting) has proved incredibly powerful in solving such issues, even before the advent of Machine Learning. However, the potential of these techniques is not limited to the financial area alone.

During the last ten years, especially thanks to Machine Learning, many other domains have benefited from Time Series Analysis and Forecasting, opening new possibilities, and redefining technological progress strategies. Let’s think for a moment about the medical area and, specifically, about the Covid-19 situation. The usage of scientific forecasting techniques over good and lengthy temporal data showed to be crucial in designing and putting in place efficient countermeasures to disease spread.

This is equally true also in business development decisions: to be able to understand which the best strategies are to pursue and the actions to perform, it is required to where the current decisions are taking us to.

Explaining the Terminology

You may be wondering now: what exactly is a Time Series Analysis and Forecasting?

First of all, a Time Series is a collection of time-order values of a specific element (e.g., the outside temperature) in which the time interval between each observation is constant.

The Time Series Analysis is the process of inspecting how the values of the Time Series change over time. This step allows spotting some of the most common patterns, like Trend, Seasonality, White Noise, Auto-Correlation, and Non-Stationary.

Time Series Forecasting is a set of techniques that aims to predict future events by analyzing past data behavior while keeping into account the above-mentioned patterns.

Let’s now look at an example of a Time Series, which can represent, for example, the value of a stock with respect to time. The image below shows an example of a Time Series that has several patterns:

  • Trend (it’s increasing up to t=200)
  • Seasonality (you can clearly see a pattern that is repeating up to t=200)
  • White Noise (the line has small fluctuations)
  • Non-Stationary (all the previous patterns stop abruptly at t=200 because a big event happen that changes the Time Series behavior. Maybe a crisis in the stocks market)

Civil War: Statistics vs. Machine Learning

Given such definitions, it is quite clear that any business could be fascinated, by the possibility of being able to forecast how its growth will develop, given different current actions. The question is: what are the available tools to achieve Time Series Forecasting? The answer is Statistics and Machine Learning.

Statistical Forecasting is the oldest and most widely used approach to analyze and perform predictions over time series. It includes various techniques ranging from the most trivial one (Naïve Forecasting) to more complex methods, such as ARIMA or TBATS. The Auto-Regressive Integrated Moving Average (ARIMA) is particularly good in modeling patterns as trend, seasonality, and non-stationary, but only when such elements are treated with preliminary steps like Differencing and Seasonal-differencing. TBATS is specialized in modeling time series with complex seasonal patterns, different from ARIMA which can only account for one seasonality.

An alternative to Statistical Forecasting is represented by the new growing Machine Learning techniques, which include, for example, RNN, MARS, and GLM. A Recurrent Neural Network is a particular type of Neural Network able to have an “internal memory” of what has been fed into it in the past of its training stage. This is a key element when dealing with Time Series because the ability of the model to predict the future is strictly related to its capability to “remember” the past.

How to Survive in This Jungle of Technologies

If after such explanations, you are now wondering: what is the best technique to apply? The answer is: it depends. Generally speaking, it is hard to compare such methods integrally and figure out which is unquestionably the best. That’s because each context where to apply Time Series Forecasting is different and so there are slight differences that could favor one technique over the other.

The literature is still divided among those who support Statistics approaches (Makridakis Et al., 20181) or who see advantages in Machine Learning based methods (Cerqueira Et al., 20192). The discussion here is regarding the effort required by both approaches for putting them into a real system and the amount of data needed to make them operational.

Therefore, the real question to ask is: what is the best technique to apply in my business use case? The answer is: it depends, again. Do not despair, this is not a never-ending story! Since this is where Data Scientists come into play and provide their valuable expertise. Data Scientists identify the business use case for which to apply a Time Series Forecasting and its characteristics and thereby narrow down the options of possible techniques to a few subsets. This is where Data Scientists come into play and bring their expertise into the game.

Let’s now dive into a practical example in which we will explain how to approach the prediction of stock prices.

First of all, a comprehensive Time Series Analysis is mandatory to highlight which are the main characteristics of our temporal data and shrink the list of possible approaches to its prediction phase. Understanding what kind of data we are dealing with is a fundamental step for successful modeling and subsequent prediction. In our case, the stock price could have over time a shape like this:

The time series above represents the stock price evolution of Alphabet Inc Class A in the last six months. Let’s now simulate a brief Time Series Analysis over such data:

  • Trend: there is a positive trend in such stock price values over a long period of time, as anyone can expect in the stock market
  • Seasonality: it is hard to spot any kind of seasonality here, mainly because the stock market, generally speaking, is quite floating
  • White Noise: as we can expect, the stock price has a lot of fluctuations
  • Non-Stationary: do you see that hole right after 2020? That’s the covid pandemic…

Given such analysis, now comes the time to choose the best subset of approaches to start modeling our time series. As mentioned above there are two approaches to perform time series forecasting, statistical forecasting, and machine learning techniques. Currently, the most promising option from the landscape of statistics would be ARIMA, while RNNs could be chosen for the machine learning technique. It is always advisable to not start your forecasting phase by relying on a single model but to choose a subset to compare instead.

The performance of Time Series Forecasting techniques can be evaluated through several metrics (e.g., MSE, RMSE, MAE, or MAPE), hence it is important to identify the most promising one. The performance evaluation is done by comparing the prediction model against selected validation data (e.g., we predict the last known year of data we have and evaluate how accurate we predict them). The following example image shows how such evaluation could look like: the predicted values (orange) compared with the actual values (blue). It is possible to see that the orange line can approximate quite well the real values, meaning that our model is working well.

Keep in mind that, as with all systems based on Machine Learning, it is important to always keep the model updated with fresh data, so that also new behaviors can be modeled. Additionally, frequently evaluate the system performance through ad hoc KPI, so that the model can be continuously adjusted to perform better and better.

If you need help

Machine Learning Reply can support you in every phase of such adventures, from the data collection to the Time Series Analysis up to the implementation of the best model for forecasting your data. Our expertise can also provide you with a way to keep your model always up to date with the latest information coming from a comprehensive and completely automated pipeline of data preparation and ingestion. The cooperation of all these elements is at the service of the ultimate result: making your predictions more and more precise and enabling real benefit to your business.

--

--