Why is Augmented Dickey–Fuller test (ADF Test) so important in Time Series Analysis

Mukesh Chaudhary
7 min readApr 9, 2020

--

ADF Test

I am trying to describe augmented Dickey–Fuller test (ADF test ) and why is it so important in time series analysis. Augmented Dickey Fuller test ( ADF Test) is a common statistical test used to test whether a given Time series is stationary or not . It is one of the most commonly used statistical test when it comes to analyzing the stationary of a series. Stationary is very important factor on time series . In ARIMA time series forecasting , the first step is to determine the number of differencing required to make the series stationary because model cannot forecast on non stationary time series data. let’s try to understand little bit depth.

Stationary

What is stationary ? How can i know that ? Time series is different from more traditional classification and regression predictive modeling problems.Time series has several characteristics like trend , seasonal, residual etc and highly dependent over time. In simple word , stationary time series data donot dependent on time. Time series are stationary if they donot have trend or seasonal effects. Summary statistics calculated on the time series are consistent over time like the mean or the variance of the observation. When a time series is stationary , it can be easier to model .

In python , we can check easily characteristics of time series data by statmodels.tsa.seasonal module.

# Time series data structure 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from statsmodels.tsa.seasonal import seasonal_decompose
df = pd.read_csv("BrentOilPrices.csv")
df['Date'] = pd.to_datetime(df.Date) # change datatype object to datetime
df.set_index('Date',inplace = True) # set index as datetime # decompose of time series data
decompose = seasonal_decompose(df.resample('M').sum(),freq=12)
fig = plt.figure()
fig = decompose.plot()
fig.set_size_inches(12,8)
characteristics of Time series

How to check stationary of time series

We can check stationary by two ways. one is a manually check of mean and variance of time series and another way is a using ADF test function. As we know that stationary time series don't have change mean or variance over time . So we can split equal length of data and compare mean or variance of the splitted data.

# Time series stationary check
df = pd.read_csv("BrentOilPrices.csv")
df['Date'] = pd.to_datetime(df.Date) # change datatype object to datetime
df.set_index('Date',inplace = True) # set index as datetime
#taking half length of data
split = round(len(df['Price'])/2)
# split two part of time series data
X1,X2 = df['Price'][0:split],df['Price'][split:]
mean1 ,mean2 = X1.mean() , X2.mean()
var1,var2 = X1.var() , X2.var()
print("\033[1m" + "\nTo check Stationary \n" + "\033[0m")
print("\nBefore difference mean and Variance\n")
print("mean1 = %f ,mean2 = %f" %(mean1,mean2))
print("var1 = %f ,var2 = %f" %(var1,var2))
fig ,ax = plt.subplots(1,2,figsize = (18,6))
plt.title("histogram of time series data")
ax[0].plot(df['Price'])
ax[0].set_xlabel("Time")
ax[0].set_ylabel("Value")
ax[0].set_title("Time Series plot ")
sns.distplot(df['Price'],ax =ax[1])
Mean , variance of two equal splitted time series data

After First differencing

# Time series data after first difference 
df['first_diff'] = df.diff()
df.dropna(inplace = True)
#taking half length of data
split = round(len(df['first_diff'])/2)
# split two part of time series data
X1,X2 = df['first_diff'][0:split],df['first_diff'][split:]
mean1 ,mean2 = X1.mean() , X2.mean()
var1,var2 = X1.var() , X2.var()
print("\033[1m" + "\nTo check Stationary \n" + "\033[0m")
print("\nAfter first difference\n")
print(f"mean1 = %f ,mean2 = %f" %(mean1,mean2))
print("var1 = %f ,var2 = %f" %(var1,var2))
fig ,ax = plt.subplots(1,2,figsize = (18,6))
plt.title("histogram of time series data after first difference ")
ax[0].plot(df['first_diff'])
ax[0].set_xlabel("Time")
ax[0].set_ylabel("Value")
ax[0].set_title("Time Series plot after first difference")
sns.distplot(df['first_diff'],ax =ax[1])
Mean , variance of two equal splitted time series data

In the first graph , we can notice trend , seasonal of time series data and there have also big different on mean and variance of the series. And after first differencing , they have very slightly different on mean and variance . So we can say second time series may be stationary .

Now let’s talk main point about Augmented Dickey Fuller Test ( ADF Test ) . This is also used to check stationary of time series. And even we can use to find out the number of differencing used on ARIMA model for forecasting . Before we go ADF Test function directly , let’s try to understand how is working . The ADF test is a fundamentally a statistical significance test. That means , There is a hypothesis testing involved with a null and alternate hypothesis and as a result a test statistic is computed and p-values get reported. From the statistic test and the p-values , we can make an inference as to whether a given time series is stationary or not.

Unit Root Test

Unit root is a characteristic of a time series that makes it non-stationary. And ADF test belong to the unit root test. Technically , a unit root is said to exist in a time series of value of alpha =1 in below equation.

unit root equation

where Yt is value of the time series at time ‘t’ and Xe is an exogenous variable .

The presence of a unit root means the time series is non-stationary.

A Dickey-Fuller test is a unit root test that tests the null hypothesis that α = 1 in the following model equation . α ( alpha ) is the coefficient of the first lag on Y.

Null Hypothesis (Ho): α (alpha) =1

where,

Y(t-1) = lag 1 af time series and ø(delta) Y(t-1) is first difference of time series at time(t-1).

Fundamentally, it has a similar null hypothesis as the unit root test.That is, the coefficient of Y(t-1) is 1, implying the presence of a unit root. If not rejected, the series is taken to be non-stationary.

The Augmented Dickey Fuller Test envolved based on the above equation and is one of the most common form of Unit Root test.

ADF test is an ‘Augmented’ Version of the Dicker Fuller test . ADF test expands the Dickey Fuller test equation to include high order of regressive process in the model.

If we noticed , we have only added more differencing terms, while the rest of the equation remains the same.

However , the null hypothesis is still the same as the Dickey Fuller test.

A key point to remember here is: Since the null hypothesis assumes the presence of unit root, that is α=1, the p-value obtained should be less than the significance level (say 0.05 or 0.01) in order to reject the null hypothesis. Thereby, inferring that the series is stationary.

Let’s see in python code.

We can use adfuller function from statsmodels.tsa.stattools module fot testing ADF test to check whether given time series is stationary or not.

# ADF Test before differencing

from statsmodels.tsa.stattools import adfuller
df_resample = df.resample('M').sum()
adf = adfuller(df_resample['Price'],12)
print("\nStatistics analysis\n")
print("Statistic Test : " , adf[0])
print("p-value : " , adf[1])
print("# n_lags : " , adf[2])
print("No of observation: " , adf[3])
for key,value in adf[4].items():
print(f" critical value {key} : {value}")
ADF test result before difference

Here, we noticed that statistic test value is greater than critical value and p-value is also greater than significant value(0.05). So we can say the time series is non-stationary.

After first difference

# ADF Test after differencing

from statsmodels.tsa.stattools import adfuller
df_resample = df.resample('M').sum()
df_resample['first_diffprice'] = df_resample['Price'].diff() # first difference
df_resample.dropna(inplace =True)
adf = adfuller(df_resample['first_diffprice'],12)print("\nStatistics analysis\n")
print("Statistic Test : " , adf[0])
print("p-value : " , adf[1])
print("# n_lags : " , adf[2])
print("No of observation: " , adf[3])
for key,value in adf[4].items():
print(f" critical value {key} : {value}")
ADF test result after first difference

Now after first difference , we notice that statistic test and p-value are very low than critical value and significant value(0.05) respectively. So this is stationary . And we also found that given time series would become stationary on first difference . Now , The number of differencing of SARIMA model will be 1(d) which is used on forecasting .

We can also check above time series status by plot . let’s see

# plot of time series data for ADF test
fig ,ax = plt.subplots(1,2,figsize=(16,5))
# mean ,std of before difference data
mean1 = df_resample['Price'].rolling(12).mean()
std1 = df_resample['Price'].rolling(12).std()
ax[0].plot(df_resample['Price'],color ='blue',label = 'original')
ax[0].plot(mean1,color ='red',label = 'mean')
ax[0].plot(std1,color ='black',label = 'std')
ax[0].set_title("Before first diff time series data with mean and std ")
ax[0].set_xlabel("prices")
ax[0].set_ylabel("Years")
# mean ,std after difference data
mean2 = df_resample['first_diffprice'].rolling(12).mean()
std2 = df_resample['first_diffprice'].rolling(12).std()
ax[1].plot(df_resample['first_diffprice'],color ='blue',label = 'original')
ax[1].plot(mean2,color ='red',label = 'mean')
ax[1].plot(std2,color ='black',label = 'std')
ax[1].set_title("After first diff time series data with mean and std ")
ax[1].set_xlabel("prices")
ax[1].set_ylabel("Years")
plt.legend(loc ='best')plt.show()
plot of time series data for ADF test.

Conclusion

It is described how to test time series data by ADF test , plot and manually whether it is stationary or not . Because stationary of time series is very important for forecasting . It is also tried to find the number of differencing of time series by ADF test used in SARIMA model that makes time series stationary .

--

--