Machine Learning — 4 (Time Series)
Analyzing Time Series Data is important for predicting various business metrics in future. Eg. Sales, Demand, # of Customer Support Calls etc.
Benefits / Uses of Time Series analysis
- This is useful in planning for raw materials, human resources, storage capacity, costs
- Number of people required to support customer queries by predicting # of calls expected
- Optimal number of resources required in a particular team, by forecasting people needed in each of the team, then cross training people across processes / teams and moving them to teams where they are required.
Types of Models
- Smoothing models
- Time Series Models
Smoothing
1. Simple Moving Average
A simple moving average of period 3 = average of just preceding 3 months.
Problems with the SMA
- All observations have equal effect. In reality, recent observations have more predictive power than older observations. This problem is solved by Exponential smoothing
2. Exponential Smoothing
It is an extention of Moving Average. In this case, different weightage should be given to different periods. This is based on the premise that different periods have different predictive power.
Forecast for the first period is same as the most recent actual observation
First Forecast = Prev Actual
Prev Error = Prev Actual — Prev Forecasted
Next forecast = Prev Forecast + α * Prev Error
OR Next forecast = Prev Forecast + α * (Prev Actual — Prev Forecasted)
How to arrive at Alpha?
By calculating the MAPE — Mean absolute percentage error and then take the one which is minimum (global minima).
mape = average (abs (actual — forecasted) / actual)
In excel, use solver to minimize it. minimize mape by changing alpha (≤ 1).
The resulting alpha can be used.
Working with time series data in python
* use a date parser to process the columns contains date or month information
* make this column as the index column of the data frame
dateparse = lambda dates: pd.datetime.strptime(dates,'%Y-%m'))
data = pd.read_csv('file.txt', parse_dates = ['Month'], index_col = 'Month', dateparser = dateparse)
Volatility and Data Volumes
If you have lot of data, roll up the data
- if the data varies by date for last 10 years, roll up to month level
- if the data varies by products, roll up to product category level
This will help reduce volatility in data. Forecasting models work well for stable / smooth data, but not for volatile data. Rolling up provides a way to smooth data.
Components of Time Series
Key components
- Trend
- Cyclicality / Seasonality
- Random Error
Additive model
Y(t) = T(t) + S(t) + E(t)
Another school of thought — Multiplicative model
Y(t) = T(t) * S(t) * E(t)
This is actually same as additive model if we take a log of Y(t)
Y(t) = Log T(t) + Log S(t) + Log E(t)
TODO: Excel decomposition of time series
Why are are trying to split the observations into components?
If the series is stationary (mean and variance remains same over time), only then we can use historical data to predict the future.
- If trend is present — mean will vary over time
- If seasonality is present — variance will vary over time
Based on the nature of time series, we will apply the relevant model.
PS: In case of Stocks, the seasonality is not consistent in the short term. We use GARCH and ARCH. In the long run, trend is more important than seasonality and error, therefore moving average becomes the most important indicator.
Time Series Models
- AR
- MA
- ARMA
- ARIMA
- SARMA
- SARIMA
If the time series is stationary, we can fit AR, MA or ARMA.
Auto Regressive (AR) model
A regression based equation to predict the value of Y(t) looks like:
Y(t) = α1 * Y(t-1) + α2 * Y(t-2) + αk * Y(t-k) + e(t)
This is the AR model of order = k
We will never be able to predict e(t). The model contains only rest of the terms. For the same of completion of the equation, we include e(t).
The Order of the model defines how many time lagged observations do we account for, while calculating Y(t)
Moving Average (MA) model
This model uses errors from previous forecasts to predict future values.
Y(t) = β1 * e(t-1) + β2 * e(t-2) + βk * e(t-k) + e(t)
e(t) is the error in the forecast made at time t. This is essentially = Y(t) — Forecasted Y(t)
This is the MA model of order = k
The Order of the model defines how many time lagged observations do we account for, while calculating Y(t)
Augmented Dickey Fuller test — used to understand if the series is stationary or not. If the null hypothesis is rejected, series is stationary.
If the series in not stationary, there can be 3 scenario
- Only Trend — ARMA
- Only Seasonality — SARMA
- Both Trend and Seasonality — SARIMA
Auto Regressive Moving Average (ARMA) model
It combines the perspectives of both AR and MA models.
Y(t) = α1 * Y(t-1) + α2 * Y(t-2) + αk * Y(t-k) + e(t) + β1 * e(t-1) + β2 * e(t-2) + βk * e(t-L)
This is the MA model of order = (K, L), where K is the order of AR and L is the order of MA models.
How to find K and L?
Both can vary between 0 to infinity.
We can use couple of techniques to calculate these values
- ACF (auto correlation function) — L is calculated using this model
- PACF (partial auto correlation function) — K is calculated using this model
ACF gives a auto correlation between Y(t) and Y(t-1)
In general, ACF = CORREL(Yt, Yt-k)
PACF
It is also a correlation between Y(t) and Y(t-k), k depending on the order. However, it removes the effect of Y(t-k+1).. till.. Y(t-1) on Y(t)
PACF = CORREL(Yt, Yt-k), without the correlation between intermediate time lagged observations.
K is the number of periods which we look back, for predicting the next value of Y
K and L come from these graphs and both come as ranges.
If the models are not stationary, we use the concept of Differencing
If the series becomes stationary after Differencing on 2. Build ARMA model on the differences series. Once you get the forecast, we will have to rollback the differencing by back calculating / adding the differences to the forecast.
Auto Regressive Integrated Moving Average (ARIMA)
Instead of doing all these above activities manually, ARIMA model does this in a single step.
The model has 3 attributes ARIMA (P,D,Q), where:
P = Period
D = Order of Differencing
Q = MA component
ARIMA model is used, when we only have Trend
Seasonally differenced Integrated ARMA model (SARMA)
If we have data, which only has seasonality and no trend, we use SARMA model. Eg. intraday pricing
The methodology is same as ARMA. In the differencing phase, rather than differencing Y(t) — Y(t-1), we use Y(t) — Y(t-k), where is the seasonality period.
- SD1(12) = Y(12) — Y(0)
- SD1(13) = Y(13) — Y(1)
For second order differencing also, we use similar methodology
- SD2(24) = SD1(24) — SD1(12)
- SD2(25) = SD1(25) — SD1(13)
The model has 4 attributes:
ARIMA (P,D,Q, k), where
k = Period of Seasonality
SARMA model is used when we only have seasonality
Seasonally differenced Auto Regressive Integrated Moving Average (SARIMA)
This model used when we have both — Trend and Seasonality
SARIMA(p,d,q)(P,D,Q,k)
p,d,q — refer to the parameters of the ARMA model
P,D,Q,k — refers to the parameters of the SARMA model
Why can’t we use SARIMA to solve all types of time series problems?
The principle of Parsimony — do a tradeoff between complexity and accuracy of the model. Don’t kill a fly with a sword.
Measures of Errors for Time Series models
- MSE
- MAPE (mean absolute % error)
= mean (abs(actual — predicted)/actual)