A Quick Introduction On Granger Causality Testing For Time Series Analysis
Augmented Dickey-Fuller (ADF) test, Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests, Vector Autoregressions (VA), Durbin–Watson statistic, Cointegration test
The Granger causality test is a statistical hypothesis test for determining whether one time series is a factor and offer useful information in forecasting another time series.
For example, given a question: Could we use today’s Apple’s stock price to predict tomorrow’s Tesla’s stock price? If this is true, our statement will be Apple’s stock price Granger causes Tesla’s stock price. If this is not true, we say Apple’s stock price does not Granger cause Tesla’s stock price.
The Data
So, let’s go to Yahoo Finance to fetch the adjusted close stock price data for Apple, Walmart and Tesla, start from 2010–06–30 to 2020–12–18.
Visualize the Time Series
Time series can be represented using either line chart or area chart.
Apple and Walmart time series have a fairly similar trend patterns over the years, where Tesla Stock IPOed just over 10 years ago and it has surprised everyone with over 700% rise year-to-date in 2020.
ADF Test for Stationarity
The ADF test is one of the most popular statistical tests. It can be used to help us understand whether the time series is stationary or not.
Null hypothesis: If failed to be rejected, it suggests the time series is not stationarity.
Alternative hypothesis: The null hypothesis is rejected, it suggests the time series is stationary.
The p-values are all well above the 0.05 alpha level, we cannot reject the null hypothesis. So the three time series are not stationary.
KPSS Test for Stationary
The KPSS test figures out if a time series is stationary around a mean or linear trend, or is non-stationary due to a unit root.
Null hypothesis: The time series is stationary
Alternative hypothesis: The time series is not stationary
The p-value are all less than 0.05 alpha level, therefore, we can reject the null hypothesis and derive that the three time series are not stationary.
After cross-check ADF test and KPSS test. We can conclude that the three time series data we have here are not stationary. We will transform the time series to be stationary by difference method.
Difference Method
ADF Test Again
After transforming the data, the p-values are all well below the 0.05 alpha level, therefore, we reject the null hypothesis. So the current data is stationary.
KPSS Test Again
Some of the KPSS Null Hypothesis could not be rejected.
VAR Model
The VAR class assumes that the passed time series are stationary. Non-stationary or trending data can often be transformed to be stationary by first-differencing or some other method.
There is no hard-and-fast-rule on the choice of lag order. It is basically an empirical issue. However, it is often advised to use the AIC in selecting the lag order with the smallest value. Therefore, we will select lag order = 15.
results = model.fit(maxlags=15, ic='aic')
results.summary()
The biggest correlation is 0.43 (Apple & Tesla).
Durbin-Watson Statistic
The Durbin Watson Test is a measure of autocorrelation in residuals from regression analysis.
A value of 2.0 means that there is no autocorrelation detected in the residuals.
Granger Causality Test
The following code was borrowed from stackoverflow:
The row are the response (y) and the columns are the predictors (x). If a given p-value is < significance level (0.05), for example, take the value 0.0 in (row 1, column 2), we can reject the null hypothesis and conclude that walmart_x Granger causes apple_y. Likewise, the 0.0 in (row 2, column 1) refers to walmart_y Granger causes apple_x.
All the time series in the above data are interchangeably Granger causing each other.
Forecasting
Remember we transformed the data by difference method, now we will invert the transformation.
Jupyter notebook can be found on Github. Happy Holidays!