Analytics Vidhya
Published in

Analytics Vidhya

A Complete Guide To Time Series Analysis — Story

Disclaimer : If you haven’t read the prologue, its highly recommended to read it first and then come back here to understand the terminologies used. If you have understanding of frequent terms used in Time Series, you may continue.

Statistical Time-Series Analysis using Python

Dickey-Fuller Test :- Dickey-Fuller Test can be used to determine the presence of unit root in the series, and hence help us understand if the series is stationary or not. The intuition behind Dickey-Fuller Tests is if the series y is stationary (or trend-stationary), then it has a tendency to return to a constant mean. Therefore, large values will tend to be followed by smaller values (negative changes), and small values by larger values (positive changes).

The null and alternate hypothesis of this test are:
Null Hypothesis: The series has a unit root (value of a =1)
Alternate Hypothesis: The series has no unit root.
If we fail to reject the null hypothesis, we can say that the series is non-stationary.

KPSS Test :- The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test informs if a time series is stationary around a mean or linear trend, or is non-stationary due to a unit root.

It is highly recommended that KPSS and ADF Test are used for testing stationarity in the data. Hence, the following aspects might arise if using both the tests :-
1. ADF and KPSS Test conclude that series is not stationary.
2. ADF and KPSS Test conclude that series is stationary.
3. ADF informs that series is stationary and KPSS informs that series is non stationary.
4. ADF informs that series is non stationary and KPSS informs that series is stationary.

In the first and second case, the behavior of the Time Series can be directly understood. Case 3 implies that Data is Difference Stationary (Refer the prologue) and hence differencing needs to be applied to make it Strict Stationary. Case 4 implies that Data is Trend Stationary and hence De-Trending needs to be applied to remove Trend Component and make it Strict Stationary.

ACF Test :- Autocorrelation describes the correlation between the present value of the series and its past values/lags. A time series can be decomposed into components like trend, seasonality, cyclic and residual. ACF considers all these components while finding correlations hence it’s a Complete Auto-Correlation Plot.

ACF Lag Test / PACF Test :- Partial Auto-Correlation finds correlation of the residuals that remains after removing the effects which are already explained by the earlier lag(s) with the next lag value. While modeling, creating too many features which are correlated can lead to issues related to multi-collinearity. This can be identified and tackled correctly in a Time-Series by using the Partial Auto-Correlation Function Test.

Turning Point Test :- The Turning Point Test informs whether a set of random variables are independent and identically distributed. It is most commonly used to find if a set of time-series data is truly random.

The turning point test is an especially good test to uncover cyclicality.

Ljung-Box Test :- The Ljung (pronounced Young) Box test (sometimes called the modified Box-Pierce, or just the Box test) is a way to test for the absence of serial autocorrelation, up to a specified lag k.

The test determines whether the autocorrelations for the errors or residuals are non zero.

If the autocorrelation of the residuals are very small, the model exhibits significant promise.

Shapiro-Wilk Test :- The Shapiro-Wilk test helps understand whether the sample comes from a normal distribution. The test returns a value W where small values indicate that the sample is not normally distributed.

D-Agostino Test :- It is a normality test and it uses Skewness and Kurtosis to identify the data distribution.

Canova-Hansen Test :- It is a statistical test to identify differencing order (if any) in the Time Series. It is mainly used to test for seasonal differences and to validate that the seasonal pattern is stable over a sample period.

The Epilogue would be a walk-through of a real-world application of Time-Series with all these concepts formalized and newer information on Time-Series Modeling Techniques where ARIMA, SARIMA, ARCH, GARCH, Exponential Smoothing Models, LSTM and Multi-Variate Time Series Analysis would be discussed in detail.

--

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Recommended from Medium

Covid-19: A surprisingly effective data driven model

Where Does Data Come From?

Dan’s Weeknotes s03e12

Open Targets Platform: release 20.02 is out

Talking About Missing Data

Real-Time Analytics: COVID-19 Weekly Roundup

Why Data Scientists Should Stay Open-Minded, Curious, and Non-Judgemental

An Introduction to Binary Heaps

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anant Kumar

Anant Kumar

Machine Learning & Deep Learning Practitioner | Learning is Continuous | Github : https://github.com/anant-kumar-0308

More from Medium

Predicting HDB Resale Prices in Singapore during COVID-19

Predicting Health Insurance Charges with Machine Learning

Machine Learning: Hierarchical and K-Means Clustering with Python

Time Series Forecasting