Machine Learning — 4 (Time Series)

Gaurav Madan
6 min readDec 17, 2019

--

Analyzing Time Series Data is important for predicting various business metrics in future. Eg. Sales, Demand, # of Customer Support Calls etc.

Benefits / Uses of Time Series analysis

  • This is useful in planning for raw materials, human resources, storage capacity, costs
  • Number of people required to support customer queries by predicting # of calls expected
  • Optimal number of resources required in a particular team, by forecasting people needed in each of the team, then cross training people across processes / teams and moving them to teams where they are required.

Types of Models

  • Smoothing models
  • Time Series Models

Smoothing

1. Simple Moving Average

A simple moving average of period 3 = average of just preceding 3 months.

Problems with the SMA

  • All observations have equal effect. In reality, recent observations have more predictive power than older observations. This problem is solved by Exponential smoothing

2. Exponential Smoothing

It is an extention of Moving Average. In this case, different weightage should be given to different periods. This is based on the premise that different periods have different predictive power.

Forecast for the first period is same as the most recent actual observation

First Forecast = Prev Actual

Prev Error = Prev Actual — Prev Forecasted

Next forecast = Prev Forecast + α * Prev Error

OR Next forecast = Prev Forecast + α * (Prev Actual — Prev Forecasted)

How to arrive at Alpha?

By calculating the MAPE — Mean absolute percentage error and then take the one which is minimum (global minima).

mape = average (abs (actual — forecasted) / actual)

In excel, use solver to minimize it. minimize mape by changing alpha (≤ 1).

The resulting alpha can be used.

Working with time series data in python

* use a date parser to process the columns contains date or month information

* make this column as the index column of the data frame

dateparse = lambda dates: pd.datetime.strptime(dates,'%Y-%m'))
data = pd.read_csv('file.txt', parse_dates = ['Month'], index_col = 'Month', dateparser = dateparse)

Volatility and Data Volumes

If you have lot of data, roll up the data

  • if the data varies by date for last 10 years, roll up to month level
  • if the data varies by products, roll up to product category level

This will help reduce volatility in data. Forecasting models work well for stable / smooth data, but not for volatile data. Rolling up provides a way to smooth data.

Components of Time Series

Key components

  • Trend
  • Cyclicality / Seasonality
  • Random Error

Additive model

Y(t) = T(t) + S(t) + E(t)

Another school of thought — Multiplicative model

Y(t) = T(t) * S(t) * E(t)

This is actually same as additive model if we take a log of Y(t)

Y(t) = Log T(t) + Log S(t) + Log E(t)

TODO: Excel decomposition of time series

Why are are trying to split the observations into components?

If the series is stationary (mean and variance remains same over time), only then we can use historical data to predict the future.

  • If trend is present — mean will vary over time
  • If seasonality is present — variance will vary over time

Based on the nature of time series, we will apply the relevant model.

PS: In case of Stocks, the seasonality is not consistent in the short term. We use GARCH and ARCH. In the long run, trend is more important than seasonality and error, therefore moving average becomes the most important indicator.

Time Series Models

  • AR
  • MA
  • ARMA
  • ARIMA
  • SARMA
  • SARIMA

If the time series is stationary, we can fit AR, MA or ARMA.

Auto Regressive (AR) model

A regression based equation to predict the value of Y(t) looks like:

Y(t) = α1 * Y(t-1) + α2 * Y(t-2) + αk * Y(t-k) + e(t)

This is the AR model of order = k

We will never be able to predict e(t). The model contains only rest of the terms. For the same of completion of the equation, we include e(t).

The Order of the model defines how many time lagged observations do we account for, while calculating Y(t)

Moving Average (MA) model

This model uses errors from previous forecasts to predict future values.

Y(t) = β1 * e(t-1) + β2 * e(t-2) + βk * e(t-k) + e(t)

e(t) is the error in the forecast made at time t. This is essentially = Y(t) — Forecasted Y(t)

This is the MA model of order = k

The Order of the model defines how many time lagged observations do we account for, while calculating Y(t)

Augmented Dickey Fuller test — used to understand if the series is stationary or not. If the null hypothesis is rejected, series is stationary.

If the series in not stationary, there can be 3 scenario

  • Only Trend — ARMA
  • Only Seasonality — SARMA
  • Both Trend and Seasonality — SARIMA

Auto Regressive Moving Average (ARMA) model

It combines the perspectives of both AR and MA models.

Y(t) = α1 * Y(t-1) + α2 * Y(t-2) + αk * Y(t-k) + e(t) + β1 * e(t-1) + β2 * e(t-2) + βk * e(t-L)

This is the MA model of order = (K, L), where K is the order of AR and L is the order of MA models.

How to find K and L?

Both can vary between 0 to infinity.

We can use couple of techniques to calculate these values

  • ACF (auto correlation function) — L is calculated using this model
  • PACF (partial auto correlation function) — K is calculated using this model

ACF gives a auto correlation between Y(t) and Y(t-1)

In general, ACF = CORREL(Yt, Yt-k)

PACF

It is also a correlation between Y(t) and Y(t-k), k depending on the order. However, it removes the effect of Y(t-k+1).. till.. Y(t-1) on Y(t)

PACF = CORREL(Yt, Yt-k), without the correlation between intermediate time lagged observations.

K is the number of periods which we look back, for predicting the next value of Y

K and L come from these graphs and both come as ranges.

If the models are not stationary, we use the concept of Differencing

If the series becomes stationary after Differencing on 2. Build ARMA model on the differences series. Once you get the forecast, we will have to rollback the differencing by back calculating / adding the differences to the forecast.

Auto Regressive Integrated Moving Average (ARIMA)

Instead of doing all these above activities manually, ARIMA model does this in a single step.

The model has 3 attributes ARIMA (P,D,Q), where:

P = Period

D = Order of Differencing

Q = MA component

ARIMA model is used, when we only have Trend

Seasonally differenced Integrated ARMA model (SARMA)

If we have data, which only has seasonality and no trend, we use SARMA model. Eg. intraday pricing

The methodology is same as ARMA. In the differencing phase, rather than differencing Y(t) — Y(t-1), we use Y(t) — Y(t-k), where is the seasonality period.

  • SD1(12) = Y(12) — Y(0)
  • SD1(13) = Y(13) — Y(1)

For second order differencing also, we use similar methodology

  • SD2(24) = SD1(24) — SD1(12)
  • SD2(25) = SD1(25) — SD1(13)

The model has 4 attributes:

ARIMA (P,D,Q, k), where

k = Period of Seasonality

SARMA model is used when we only have seasonality

Seasonally differenced Auto Regressive Integrated Moving Average (SARIMA)

This model used when we have both — Trend and Seasonality

SARIMA(p,d,q)(P,D,Q,k)

p,d,q — refer to the parameters of the ARMA model

P,D,Q,k — refers to the parameters of the SARMA model

Why can’t we use SARIMA to solve all types of time series problems?

The principle of Parsimony — do a tradeoff between complexity and accuracy of the model. Don’t kill a fly with a sword.

Measures of Errors for Time Series models

  • MSE
  • MAPE (mean absolute % error)

= mean (abs(actual — predicted)/actual)

--

--