A Complete Guide to Time Series Analysis — Prologue

Anant Kumar
Analytics Vidhya
Published in
4 min readApr 12, 2021

Disclaimer : This article introduces the theoretical concepts behind Time Series Analysis and the subsequent article would provide implementation and applied knowledge.

With its wide range of applications ranging but not limited to Trend Analysis, Demand Forecasting, Inventory Studies, Budgetary Analysis, Stock Market Predictions, Time Series Analysis is an integral skill required for the niche Data Science/Machine Learning Industry.

What is Time Series Analysis?

Time Series Data is simply a series of data points ordered in time. In a Time Series, time is often the independent variable and the goal is to make the forecast for future periods. Time Series Analysis can be defined as the analysis of this ordered data.

To understand Time Series, one must understand the underlying statistics.

Terminologies

Stationarity :- A Time Series is said to be stationary if its statistical properties do not change over time. Simply put, Time-Series has constant mean and variance, and co-variance is independent of time.
Ideally, Time-Series Modeling requires Strict Stationary Data.

Types of Stationarity

Let us understand the different types of stationarities and how to interpret the results of the above tests.

Strict Stationary: For a strict stationary series, the mean, variance and covariance are not the function of time.

Trend Stationary: A series that has no unit root but exhibits a trend is referred to as a trend stationary series. The KPSS test (explained later) classifies a series as stationary on the absence of unit root.

Difference Stationary: A time series that can be made strict stationary by differencing falls under difference stationary. ADF test (explained later) is also known as a difference stationarity test.

Seasonality :- A Time Series is said to be seasonal if there exists a seasonal pattern :- a pattern that is influenced by seasonal factors.
Seasonal Time Series are also referred as Periodic Time Series as the periodic patterns is always of a fixed and known period.

Cyclicality :- A Time Series is said to be cyclic if it exhibits peaks and troughs that are not of fixed period.

Even seasoned professionals fail sometimes in identifying whether the data is seasonal or cyclic. Refer this post for understanding Cyclicality Vs. Seasonality as the underlying concepts are beyond the scope of this article.

Trend :- Trend shows the general tendency of data to increase or decrease over time. A Trend is a smooth, general, long-term, average tendency.

It is not always necessary that there is an increase or decrease in the same direction throughout the given period of time.

Normality :- In probability theory, the normal (or Gaussian) distribution is a very common continuous probability distribution. The simplest case of a normal distribution is known as the standard normal distribution. This is a special case with mean as 0 and standard deviation as 1. Every normal distribution is a version of the standard normal distribution, whose domain has been stretched by the standard deviation. Normal Distributions are often used in real-valued random variables whose distributions are not known (For e.g. Flora and Fauna Estimation), owing to the Central Limit Theorem.

Central Limit Theorem :- Let us consider a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement. CLT states that this distribution of sample means will be normally distributed.

Autocorrelation :- Autocorrelation defines how a Time-Series is similar with itself. Consider a discrete series of values ordered by time. For lag 1, compare the actual time series with a lagged time series. Put simply, shift the Time Series by 1 (In Time Axis) before comparing it with itself. Proceed doing this for the entire length of time series by shifting it by 1 every time. This is essentially the autocorrelation function.

Partial Autocorrelation :- The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to the terms at shorter lags.

The autocorrelation for an observation and an observation at a prior time step is comprised of both the direct correlation and indirect correlations. The Partial Autocorrelation Function removes these indirect correlations.

Difference Order :- In a Time Series, a time series data have an inherent temporal structure. Some temporal structure might exist even after performing a differencing operation. The process of differencing can be repeated more than once until all temporal dependence has been removed. The number of times differencing is performed is called the difference order.

Auto-Regression :- Auto-Regression is a Time Series Model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step.

Moving Average :- A moving average is a calculation used to analyze data points by creating a series of averages of different subsets of the full data set.

A simple moving average (SMA) is a calculation that takes the arithmetic mean of a given set of values over a specific number of lags.

Exponential moving averages (EMA) is a weighted average that gives greater importance to the value on more recent time, making it an indicator that is more responsive to new information.

Time-Series Decomposition :- Intuition behind Time Series Decomposition is to consider Time Series as a Combination of Trend, Seasonality, and Random Noise Components. It can be treated as a good first-step to understand the underlying distribution of the Time Series Data.

Portmanteau Tests :- The Ljung–Box test is a type of statistical test of whether any of a group of autocorrelations of a time series are different from zero.

A portmanteau test is a type of statistical hypothesis test in which the null hypothesis is well specified, but the alternative hypothesis is more loosely specified.
Instead of testing randomness at each distinct lag, it tests the “overall” randomness based on a number of lags, and is therefore a portmanteau test.

IID :- It stands for Independent and Identically Distributed. It’s a common term in statistics, and would be used quite often in the further references.

Independent :- It means that the sample items are all independent events.

Identically Distributed :- It means that there are no overall trends and the sample is generated from the same probability distribution.

Proceed with the story.

Cheers !

Happy Learning !

--

--

Anant Kumar
Analytics Vidhya

Machine Learning & Deep Learning Practitioner | Learning is Continuous | Github : https://github.com/anant-kumar-0308