# Statistical Modeling and Machine Learning Applications for Time-Series Problems

--

Time Series is one domain which has been using some form or other of predictive analysis since long before the birth of contemporary machine learning. Once upon a time, our ancestors tracked the location and movement of the moon and the stars to decide when to move from place to place, when to hunt, and when to sow the seeds in the expectation of rain. In doing so they had figured out cycles and seasonality in the flow of time — something we now call the cyclical and seasonal components of a time series.

Today, we have both the computational and the algorithmic capacity to do a lot more with time-series data — predict stock prices, energy demand, temperature, and even prepare trading strategies for investors. I’m not just repeating things I’ve read; I actually did each of those things and the results are worth sharing.

First, let’s dive into the fundamentals of time-series data and some relevant use-cases that show how to tackle each of these problems.

# Time-Series Fundamentals

A time series is a sequential set of data points, typically measured over time.

Types of time series:

• Univariate: Time series containing records of a single variable.
• Multivariate: Time series containing records of multiple variables.

Typically, a time series is affected by four main components, which can be separated from the observed data. These components are:

• Trend: The general tendency of a time series to increase, decrease or stagnate over a long period of time.
• Cyclical: The cyclical variation in a time series describes the medium-term changes in the series, caused by circumstances, which repeat in cycles.
• Seasonal: Seasonal variations in a time series are fluctuations within a year according to the season.
• Irregular: Irregular or random variations in a time series are caused by unpredictable influences, which are not regular and also don’t repeat in a particular pattern.

The following graph comes from using the seasonal_decompose method within the statsmodels package in Python.

Considering the effects of these four components, two different types of models are generally used for a time series:

• Multiplicative Model: Y (t) = T(t) × S(t) × C(t) × I(t)
• Additive Model: Y(t) = T(t) + S(t) + C(t) + I(t)

A multiplicative model assumes that the four components of a time series are not necessarily independent, and they can impact one another; whereas the additive model assumes that the four components are independent of each other.

Stationarity of a Time Series: The statistical properties such as the mean and the variance of a stationary process are independent of time. To design an adequate model, the underlying time series is expected to be stationary. Generally, time series with trends/seasonality are non-stationary. In these cases, differencing and power transformations can be used to remove the trend and make the series stationary.

Some ways to check for the stationarity of data:

• Augmented Dickey Fuller Test: The Augmented Dickey Fuller Test (ADF) is a form of Hypothesis Testing for the presence of unit root — that is, a test for stationarity. Unit roots can cause unpredictable results in the time series analysis. A unit root is a stochastic trend in a time series, sometimes called a “random walk with drift”. If a time series has a unit root, it shows a systematic pattern that’s unpredictable.
• Auto-correlation Function (ACF) and Partial Auto-correlation Function (PACF) Plots: These map the linear dependence of the variable with itself at different points in time.

Let’s look at three really different approaches to modeling time-series data. I used IBM’s Data Science Experience (DSX) Platform to implement these use cases. On DSX I can store my data and run my R and Python notebooks and scripts, all in one place. This convenience made DSX the best choice for me.

# 1. Trading Strategy using ARIMA-GARCH Predictions

ARIMA (short for Auto-Regressive Integrated Moving Average) and GARCH (short for Generalized Auto-Regressive Conditional Heteroskedasticity) are both models with their roots lying deep in statistics fundamentals. They are robust and reliable and have been used by hedge funds for years.

An ARIMA model is fitted to time-series data either to better understand the data or to predict future points in the series (forecasting), and GARCH is an approach to estimate volatility in a time-series data and is especially useful in modeling volatility in the financial market.

The ARIMA-GARCH Strategy: Using past information, find the optimal predictive parameters using ARIMA and then predict the market volatility for the next day using GARCH. Based on this prediction, advise the investor to go long or short. If the direction of the market changes according to the prediction, recommend that the investor flip position in the market; if the direction of the market does not change according to the prediction, recommend that the investor stay put.

The results obtained were quite interesting:

- The ARIMA-GARCH model gives better overall returns as compared to a baseline Buy-and-Hold Strategy.

- The ARIMA-GARCH outshines when the market has a correlating pattern, that is, when there is a continuous decline or rise in the market value, say, during the 2008–2009 market crash, for example. It also performs fairly well when the market is following a stochastic pattern.

The results above show the cumulative returns for the two strategies: the ARIMA-GARCH strategy versus the Buy-and-Hold Strategy. As you can see, ARIMA-GARCH outperforms the standard Buy-and-Hold.

# 2. Stock Price Prediction using Long Short Term Memory Neural Network

As we all know, predicting stock prices is a pretty non-trivial task. But data scientists never stop trying, do they?

The approach to this problem is simple: take the time series, and convert it into a regression problem where each value from the past say, 50 days, becomes an input feature. This structure can easily be obtained using a rolling window approach. You then treat these pieces of data like features in any other regression problem. But why use an LSTM, you may ask?

In simple terms, an LSTM is a neural-network with a memory. It is able to capture long-term dependencies present within the data, which makes it an ideal structure for a time-series problem.

Here’s the structure used for the Stock Price Prediction problem that we’re trying to solve:

The network had 4 layers of LSTMs and then one Dense layer giving us the required output. Each layer in the LSTM contained 100 units. The LSTM was trained for 75 epochs with a batch size of 32.

The results obtained from the LSTM were quite fascinating.

You can see that the LSTM is able to capture the waves in the market to a very good extent.

The same approach and architecture can be used in a variety of other applications — actually, it’s robust enough to be used for absolutely all time-series data. We even tried it on temperature prediction. And it gave good results for that too.

3. Energy Demand Prediction using XGBoost

XGBoost stands for Extreme Gradient Boosting. It’s a very fast implementation of Gradient Boosting Decision Trees and is now in the toolkit of almost all machine learning practitioners.

Another nice use-case for time-series methods is predicting energy demand. This can be modeled using any of the previous two approaches, but another fun way to model this kind of time series is to take as features the year, week, hour, type of day (Sunday, Monday, Christmas, etc.) and any other relevant information, and then predict demand using XGBoost (or any other machine learning algorithm). A quick heads-up here: Tuning the XGBoost parameters can significantly improve the quality of the predictions.

This is the output of one of my implementations:

You can see that the XGBoost model was able to capture the rise and fall in the overall demand pretty well.

With these three use-cases, we’ve broadly walked through three very different ways of modeling problems. Next time you find yourself staring at a time-series problem, I encourage you to give one of these approaches a try.

Here are some resources where you can read more about the use cases or the technologies behind them: