Dec 14, 2018 · 17 min read

For taking steps to know about Data Science and Machine Learning, till now in my blogs, I have covered briefly an introduction to Data Science, Python, Statistics, Machine Learning, Regression, Linear Regression, Logistic Regression, Decision Trees and Boosting. In this seventh of the series, I shall cover Time Series.

Introduction to Time Series:

Time is the most important factor which ensures success in a business. It’s difficult to keep up with the pace of time. But, technology has developed some powerful methods using which we can ‘see things’ ahead of time.

A Time Series (TS) is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

Time Series (TS) are very frequently plotted via line charts. Time Series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and largely in any domain of applied science and engineering which involves temporal measurements.

Time series: random data plus trend, with best-fit line and different applied filters

An autocorrelation plot shows the properties of a type of data known as a time series. A time series refers to observations of a single variable over a specified time horizon. For example, the daily price of Reliance Industries Ltd. stock during the year 2017 is a time series.

Cross-sectional data refers to observations on many variables at a single point in time. For example, the closing prices of the 30 stocks contained in the BSE IT Average on January 31, 2018, would be considered cross-sectional data.

An autocorrelation plot is designed to show whether the elements of a time series are positively correlated, negatively correlated, or independent of each other. (The prefix auto means “self” — autocorrelation specifically refers to correlation among the elements of a time series.)

An autocorrelation plot shows the value of the autocorrelation function (acf) on the vertical axis. It can range from –1 to 1.

The horizontal axis of an autocorrelation plot shows the size of the lag between the elements of the time series. For example, the autocorrelation with lag 2 is the correlation between the time series elements and the corresponding elements that were observed two time periods earlier.

There are various methods of prediction & forecasting. One such method, which deals with time based data is Time Series Modeling. As the name suggests, it involves working on time (years, days, hours, minutes) based data, to derive hidden insights to make informed decision making.

Time Series models are very useful models when you have serially correlated data. Most of business houses work on time series data to analyze sales number for the next year, website traffic, competition position and much more.

Time Series forecasting is the use of a model to predict future values based on previously observed values. Time series are widely used for non-stationary data, like economic, weather, stock price, and retail sales in this post.

Time Series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called “Time Series analysis”, which focuses on comparing values of a single time series or multiple dependent time series at different points in time. Interrupted Time Series analysis is the analysis of interventions on a single time series.

Time Series (TS) data have a natural temporal ordering. This makes Time Series analysis distinct from cross-sectional studies, in which there is no natural ordering of the observations (e.g. explaining people’s wages by reference to their respective education levels, where the individuals’ data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values.

Time Series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data (i.e. sequences of characters, such as letters and words in the English language).

There are two things that makes a TS different from say a regular regression problem:

1. It is time dependent. So the basic assumption of a linear regression model that the observations are independent doesn’t hold in this case.
2. Along with an increasing or decreasing trend, most TS have some form of seasonality trends, i.e. variations specific to a particular time frame. For example, if you see the sales of a woolen jacket over time, you will invariably find higher sales in winter seasons.

Forecast quality metrics:

The most common and widely used metrics the quality of predictions:

· R squared, coefficient of determination (in econometrics it can be interpreted as a percentage of variance explained by the model), (-inf, 1]

sklearn.metrics.r2_score

· Mean Absolute Error, it is an interpretable metric because it has the same unit of measurement as the initial series, [0, +inf)

sklearn.metrics.mean_absolute_error

· Median Absolute Error, again an interpretable metric, particularly interesting because it is robust to outliers, [0, +inf)

sklearn.metrics.median_absolute_error

· Mean Squared Error, most commonly used, gives higher penalty to big mistakes and vice versa, [0, +inf)

sklearn.metrics.mean_squared_error

· Mean Squared Logarithmic Error, practically the same as MSE but we initially take logarithm of the series, as a result we give attention to small mistakes as well, usually is used when data has exponential trends, [0, +inf)

sklearn.metrics.mean_squared_log_error

· Mean Squared Logarithmic Error, practically the same as MSE but we initially take logarithm of the series, as a result we give attention to small mistakes as well, usually is used when data has exponential trends, [0, +inf)

# Importing everything from above

from sklearn.metrics import r2_score, median_absolute_error, mean_absolute_error

from sklearn.metrics import median_absolute_error, mean_squared_error, mean_squared_log_error

def mean_absolute_percentage_error(y_true, y_pred):

return np.mean(np.abs((y_true — y_pred) / y_true)) * 100

Stationarity:

A stationary series is one in which the properties — mean, variance and covariance, do not vary with time. Consider the three plots shown below:

In the first plot, we can clearly see that the mean varies (increases) with time which results in an upward trend. Thus, this is a non-stationary series. For a series to be classified as stationary, it should not exhibit a trend.

Moving on to the second plot, we certainly do not see a trend in the series, but the variance of the series is a function of time. A stationary series must have a constant variance.

In the third plot, the spread becomes closer as the time increases, which implies that the covariance is a function of time.

The three examples shown above represent non-stationary time series. Now look at a fourth plot:

In this case, the mean, variance and covariance are constant with time. This is what a stationary time series looks like.

Predicting future values using fourth plot would be easier. Most statistical models require the series to be stationary to make effective and precise predictions.

So the three basic criterion for a series to be classified as stationary series are:

1. The mean of the series should not be a function of time rather should be a constant.

2. The variance of the series should not be a function of time. This property is known as homoscedasticity.

3. The covariance of the ith term and the (i + m)th term should not be a function of time.

Methods to determine whether a given series is stationary or not and deal with it accordingly:

1. Visual test: Consider the plots we used earlier. We were able to identify the series in which mean and variance were changing with time, simply by looking at each plot. Similarly, we can plot the data and determine if the properties of the series are changing with time or not. The visual approach might not always give accurate results. It is better to confirm the observations using some statistical tests.

2. Statistical test: We can use statistical tests like the unit root stationary tests. Unit root indicates that the statistical properties of a given series are not constant with time, which is the condition for stationary time series.

Suppose we have a time series:

yt = a*yt-1 + ε t

where yt is the value at the time instant t and ε t is the error term. In order to calculate yt we need the value of yt-1, which is:

yt-1 = a*yt-2 + ε t-1

If we do that for all observations, the value of yt will come out to be:

yt = an*yt-n + Σεt-i*ai

If the value of a is 1 (unit) in the above equation, then the predictions will be equal to the yt-n and sum of all errors from t-n to t, which means that the variance will increase with time. This is knows as unit root in a time series. For a stationary time series, the variance must not be a function of time. The unit root tests check the presence of unit root in the series by checking if value of a=1. Below are the two of the most commonly used unit root stationary tests:

## a) ADF (Augmented Dickey Fuller) Test

The Dickey Fuller test is one of the most popular statistical tests. It can be used to determine the presence of unit root in the series, and hence help us understand if the series is stationary or not. The null and alternate hypothesis of this test are:

Null Hypothesis: The series has a unit root (value of a =1)

Alternate Hypothesis: The series has no unit root.

If we fail to reject the null hypothesis, we can say that the series is non-stationary. This means that the series can be linear or difference stationary.

The ADF tests gives the following results — test statistic, p-value and the critical value at 1%, 5%, and 10% confidence intervals. If the test statistic is less than the critical value, we can reject the null hypothesis (aka the series is stationary). When the test statistic is greater than the critical value, we fail to reject the null hypothesis (which means the series is not stationary).

## b) KPSS (Kwiatkowski-Phillips-Schmidt-Shin) Test

KPSS is another test for checking the stationarity of a time series (slightly less popular than the Dickey Fuller test). The null and alternate hypothesis for the KPSS test are opposite that of the ADF test, which often creates confusion.

The KPSS test has defined the null hypothesis as the process is trend stationary, to an alternate hypothesis of a unit root series.

Null Hypothesis: The process is trend stationary.

Alternate Hypothesis: The series has a unit root (series is not stationary).

The KPSS tests gives the following results — test statistic, p-value and the critical value at 1%, 2.5%, 5%, and 10% confidence intervals. If the test statistic is greater than the critical value, we reject the null hypothesis (series is not stationary). If the test statistic is less than the critical value, if fail to reject the null hypothesis (series is stationary).

## Types of Stationarity

The different types of stationarities are:

• Strict Stationary: This series satisfies the mathematical definition of a stationary process. For a strict stationary series, the mean, variance and covariance are not the function of time. The aim is to convert a non-stationary series into a strict stationary series for making predictions.
• Trend Stationary: A series that has no unit root but exhibits a trend is referred to as a trend stationary series. Once the trend is removed, the resulting series will be strict stationary. The KPSS test classifies a series as stationary on the absence of unit root. This means that the series can be strict stationary or trend stationary.
• Difference Stationary: A time series that can be made strict stationary by differencing falls under difference stationary. ADF test is also known as a difference stationarity test.

It’s always better to apply both the tests, so that we are sure that the series is truly stationary. The possible outcomes of applying these stationary tests:

• Case 1: Both tests conclude that the series is not stationary -> series is not stationary
• Case 2: Both tests conclude that the series is stationary -> series is stationary
• Case 3: KPSS = stationary and ADF = not stationary -> trend stationary, remove the trend to make series strict stationary
• Case 4: KPSS = not stationary and ADF = stationary -> difference stationary, use differencing to make series stationary

## Making a Time Series Stationary

Actually, it is almost impossible to make a series perfectly stationary, but we try to take it as close as possible. There are 2 major reasons behind non-stationarity of a TS:

1. Trend — varying mean over time. For E.g., in this case we saw that on average, the number of passengers was growing over time.

2. Seasonality — variations at specific time-frames. E.g. people might have a tendency to buy cars in a particular month because of pay increment or festivals.

In order to use time series forecasting models, it is necessary to convert any non-stationary series to a stationary series first.

Differencing

In this method, we compute the difference of consecutive terms in the series. Differencing is typically performed to get rid of the varying mean. Mathematically, differencing can be written as:

yt‘ = yt — y(t-1)

where yt is the value at a time t

## Seasonal Differencing

In seasonal differencing, instead of calculating the difference between consecutive values, we calculate the difference between an observation and a previous observation from the same season. For example, an observation taken on a Monday will be subtracted from an observation taken on the previous Monday. Mathematically it can be written as:

yt‘ = yt — y(t-n)

## Transformation

Transformations are used to stabilize the non-constant variance of a series. Common transformation methods include power transform, square root, and log transform.

Time Series Forecasting Methods

The different classical time series forecasting methods are:

1) Auto-Regression (AR)

2) Moving Average (MA)

3) Autoregressive Moving Average (ARMA)

4) Auto-Regressive Integrated Moving Average (ARIMA)

5) Seasonal Auto-Regressive Integrated Moving-Average (SARIMA)

6) Seasonal Auto-Regressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)

7) Vector Auto-Regression (VAR)

8) Vector Auto-Regression Moving-Average (VARMA)

9) Vector Auto-Regression Moving-Average with Exogenous Regressors (VARMAX)

10) Simple Exponential Smoothing (SES)

11) Holt Winter’s Exponential Smoothing (HWES)

Auto-Regression (AR)

The Auto-Regression (AR) method models the next step in the sequence as a linear function of the observations at prior time steps.

The notation for the model involves specifying the order of the model p as a parameter to the AR function, e.g. AR(p). For example, AR(1) is a first-order Auto-Regression model.

The method is suitable for univariate time series without trend and seasonal components.

## Moving Average (MA)

The moving average (MA) method models the next step in the sequence as a linear function of the residual errors from a mean process at prior time steps.

A moving average model is different from calculating the moving average of the time series.

The notation for the model involves specifying the order of the model q as a parameter to the MA function, e.g. MA(q). For example, MA(1) is a first-order moving average model.

The method is suitable for univariate time series without trend and seasonal components.

AR or MA are not applicable on non-stationary series. The primary difference between an AR and MA model is based on the correlation between time series objects at different time points.

## Autoregressive Moving Average (ARMA)

The Autoregressive Moving Average (ARMA) method models the next step in the sequence as a linear function of the observations and residual errors at prior time steps.

It combines both Auto-Regression (AR) and Moving Average (MA) models.

The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to an ARMA function, e.g. ARMA(p, q). An ARIMA model can be used to develop AR or MA models.

The method is suitable for univariate time series without trend and seasonal components.

## Auto-Regressive Integrated Moving Average (ARIMA)

The Autoregressive Integrated Moving Average (ARIMA) method models the next step in the sequence as a linear function of the differenced observations and residual errors at prior time steps.

It combines both Auto-Regression (AR) and Moving Average (MA) models as well as a differencing pre-processing step of the sequence to make the sequence stationary, called integration (I).

The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA function, e.g. ARIMA(p, d, q). An ARIMA model can also be used to develop AR, MA, and ARMA models.

The method is suitable for univariate time series with trend and without seasonal components.

It is not easy to distinguish an ARMA model from an ARIMA model graphically. The problem is that the stationary ARMA form can be made arbitrarily close to the ARIMA form by setting one of the roots of the auto-regressive characteristic polynomial arbitrarily close to one. Trying to distinguish between these models is equivalent to trying to determine whether there is a unit root in the auto-regressive characteristic polynomial. This usually requires formal modelling and testing, and it is not something that is easy to do on a purely graphical basis.

ARIMA models can be estimated following the Box–Jenkins approach. In time series analysis, the Box–Jenkins method, named after the statisticians George Box and Gwilym Jenkins, applies autoregressive moving average (ARMA) or autoregressive integrated moving average (ARIMA) models to find the best fit of a time-series model to past values of a time series.

## Seasonal Auto-Regressive Integrated Moving-Average (SARIMA)

The Seasonal Auto-Regressive Integrated Moving Average (SARIMA) method models the next step in the sequence as a linear function of the differenced observations, errors, differenced seasonal observations, and seasonal errors at prior time steps.

It combines the ARIMA model with the ability to perform the same auto-Regression, differencing, and moving average modeling at the seasonal level.

The notation for the model involves specifying the order for the AR(p), I(d), and MA(q) models as parameters to an ARIMA function and AR(P), I(D), MA(Q) and m parameters at the seasonal level, e.g. SARIMA(p, d, q)(P, D, Q)m where “m” is the number of time steps in each season (the seasonal period). A SARIMA model can be used to develop AR, MA, ARMA and ARIMA models.

The method is suitable for univariate time series with trend and/or seasonal components.

## Seasonal Auto-Regressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)

The Seasonal Auto-Regressive Integrated Moving-Average with Exogenous Regressors (SARIMAX) is an extension of the SARIMA model that also includes the modeling of exogenous variables.

Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations at the same time steps as the original series. The primary series may be referred to as endogenous data to contrast it from the exogenous sequence(s). The observations for exogenous variables are included in the model directly at each time step and are not modeled in the same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The SARIMAX method can also be used to model the subsumed models with exogenous variables, such as ARX, MAX, ARMAX, and ARIMAX.

The method is suitable for univariate time series with trend and/or seasonal components and exogenous variables.

## Vector Auto-Regression (VAR)

The Vector Auto-Regression (VAR) method models the next step in each time series using an AR model. It is the generalization of AR to multiple parallel time series, e.g. multivariate time series.

The notation for the model involves specifying the order for the AR(p) model as parameters to a VAR function, e.g. VAR(p).

The method is suitable for multivariate time series without trend and seasonal components.

## Vector Auto-Regression Moving-Average (VARMA)

The Vector Auto-Regression Moving-Average (VARMA) method models the next step in each time series using an ARMA model. It is the generalization of ARMA to multiple parallel time series, e.g. multivariate time series.

The notation for the model involves specifying the order for the AR(p) and MA(q) models as parameters to a VARMA function, e.g. VARMA(p, q). A VARMA model can also be used to develop VAR or VMA models.

The method is suitable for multivariate time series without trend and seasonal components.

## Vector Auto-Regression Moving-Average with Exogenous Regressors (VARMAX)

The Vector Auto-Regression Moving-Average with Exogenous Regressors (VARMAX) is an extension of the VARMA model that also includes the modeling of exogenous variables. It is a multivariate version of the ARMAX method.

Exogenous variables are also called covariates and can be thought of as parallel input sequences that have observations at the same time steps as the original series. The primary series(es) are referred to as endogenous data to contrast it from the exogenous sequence(s). The observations for exogenous variables are included in the model directly at each time step and are not modeled in the same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The VARMAX method can also be used to model the subsumed models with exogenous variables, such as VARX and VMAX.

The method is suitable for multivariate time series without trend and seasonal components and exogenous variables.

## Simple Exponential Smoothing (SES)

The Simple Exponential Smoothing (SES) method models the next time step as an exponentially weighted linear function of observations at prior time steps.

The method is suitable for univariate time series without trend and seasonal components.

## Holt Winter’s Exponential Smoothing (HWES)

The Holt Winter’s Exponential Smoothing (HWES) also called the Triple Exponential Smoothing method models the next time step as an exponentially weighted linear function of observations at prior time steps, taking trends and seasonality into account.

The method is suitable for univariate time series with trend and/or seasonal components.

## Overview of the Framework

The framework (shown below) specifies the step by step approach on ‘How to do a Time Series Analysis‘:

ARIMA model construction steps

Summary :

Time series is a series of data points indexed (or listed or graphed) in time order. Therefore, the data is organized by relatively deterministic timestamps, and may, compared to random sample data, contain additional information that we can extract. TS involves working on time (years, days, hours, minutes) based data, to derive hidden insights to make informed decision making. Unless our time series is stationary, we cannot build a time series model. In cases where the stationary criterion are violated, the first requisite becomes to stationarize the time series and then try stochastic models to predict this time series.

A stationary time series is the one for which the properties (namely mean, variance and covariance) do not depend on time.

The ADF test has an alternate hypothesis of linear or difference stationary, while the KPSS test identifies trend-stationarity in a series.

Time series provide the opportunity to forecast future values. Based on previous values, time series can be used to forecast trends in economics, weather, capacity planning, etc.

So we have seen different methods that can be used to check the stationarity of a time series and a suite of classical time series forecasting methods that we can test and tune on our time series dataset.

Written by