Time Series Analysis — Stationarity Check using Statistical Test

Souvik Majumder
Analytics Vidhya
Published in
4 min readMay 1, 2020

This article will focus on the procedure of carrying out a statistical test, especially the Augmented Dickey Fuller Test in order to check whether a given time series is stationary or not.

Ensuring that a time series data is stationary or not, is an important pre-processing method of modelling the time series. The stationarity assumptions can be easily violated in time series by the addition of a trend, seasonality, and other time-dependent structures.

Trends can result in a varying mean over time, whereas seasonality can result in a changing variance over time, both which define a time series as being non-stationary.

Therefore, our first step in an analysis should be to check whether there is any evidence of a trend or seasonal effects or not and to remove them, if any.

Augmented Dickey Fuller Test

Augmented Dickey-Fuller is the statistical test that we run to determine if a time series is stationary or not.

The Augmented Dickey Fuller test checks the null hypothesis that a unit root is present in a time series sample. The alternative hypothesis is usually stationarity or trend-stationarity.

The augmented Dickey–Fuller (ADF) statistic, used in the test, is a negative number. The more negative it is, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence.

If p > 0.05, then the null hypothesis stands true which states that the time series is not stationary.

Else if, p <= 0.05, then the null hypothesis is rejected, and the time series is stationary.

p here, indicates the order of the lag or the difference between the present data and past data. For example, difference between data of today and yesterday.

Our main aim is to Reject Null Hypothesis.

Mathematically, the Augmented Dickey-Fuller test allows for higher-order auto-regressive processes by including t-p output values in the model.

Explanation with an Example

Let us have an example with a small chunk of data.

A lag order = 1 means shifting the Y column by 1 place as shown below.

The above also indicates the value for the previous week (t-1).

Similarly, a lag order = 2 means further differentiation or shift to the next level.

Next, we remove the rows with the missing values to obtain the result below.

At every differentiation level, we try to find out the p-value and check whether that is lesser than or equal to the cutoff value (0.05) or not.

The shift or the differentiation will keep on going until and unless we receive a p-value lesser than or equal to 0.05.

Programmatic Explanation

Let us implement the same thing through some lines of code.

From the above plot, we can see that even if the Variance does not changes much, the Mean changes a lot throughout the data. The calculated p-value is also 0.19 which is greater than 0.05.

Therefore, we need to perform one level of differentiation or in other words, shift the time series data for lag order=1

Now, let us execute the check_stationarity function again to check the p-value.

We can notice that the mean has now become almost constant throughout the data. The new p-value is 0.0 which is lesser than 0.05.

The above tells us that we have now achieved stationarity, at k=1.

What are the advantages of taking a logarithm ?

  • It will give us the stationarity of time series with lesser number of lag orders. Basically, it tries to reduce the amplitude of the seasonality.
  • Time series becomes more accurate with linear data rather than exponential data.

I hope you now have a clear understanding of using ADF test to check the stationarity of time-series data. In my next article, I have explained the use of ARIMA for modelling the data.

--

--

Souvik Majumder
Analytics Vidhya

Full Stack Developer | Machine Learning | AI | NLP | AWS | SAP