An Introduction to Time-Series Analysis

Published in

Analytics Vidhya

14 min readAug 23, 2020

Through this article we will get to know about:

Importance of time-series data, forecasting and their impact on businesses.
Various components of time-series data, e.g. trend, seasonality, cyclic and random component.
Various model such as Auto-Regressive (AR), Moving Average (MA), Auto-Regressive Moving Average (ARMA), Auto-Regressive Integrated Moving Average (ARIMA).

According to Wikipedia, a sequence of data points equally spaced or indexed in time order is recognized as time series. It can also be said as, set of measurement of certain variables or events happened or occurred at equal time intervals. Here, time act as an independent variable for estimation. Time Series is a series of observations taken at specified equal intervals. Analysis of the series helps us to predict future values based on previous observed values. In Time series, we have only 2 variables, time & the variable we want to forecast. E.g. height of ocean tides, count of sunspots, regular changing of seasons every year, glowing of rice light, motion of pendulum in vacuum. Etc. etc. any event occurring at equal interval and contains time as a variable.

There are also many applications of Time-series, such as: yearly GDP (and many more indicators) counting, monthly sales of tickets, sales score of E-commerce, weather forecasting, earthquake predictions, stock price prediction, performance of a team in sports, as well in different fields of statistics, econometry, finance, astronomy, communication engineering. Etc. etc.

Note: Time-series analysis is not used when in dataset, dependent variable is constant or have some mathematical functions (trigonometric, logarithmic, polynomial).

Time-series can be univariate or multivariate. If the data contains observation of just a single variable (demand of a product at time t) is known as univariate time-series data. If the data consists of many variables (demand of the product at time t, price at time t, money spent on advertisement of the product at time t, product competitors’ price at time t) then it is called as multivariate time-series data.

Components of Time-series data

There are four components of time-series data:

Trend: Trend is consistent long-term upward or downward movement of the data. If we fit a line in dataset, if the slope of line is positive then it is upward trend and if the slope is negative then it is downward trend.

2. Seasonal: when factors such as time of the year or day of the week affect the dependent variable, repetitive patterns are observed in the series. Seasonal component is the repetitive upward or downward movement from the trend that occurs within a year or a month or week at fixed intervals. Such as, occurring of seasons every year. The upward or downward fluctuations may be caused by the festivals, holidays, end of season sales (EOSS), etc.

Seasonality is always of the fixed and known frequency. Here in the figure we can see that after every three small spikes there is one large spike, and this is occurring in continuing trend.

3. Cyclic: Cyclical component is the fluctuation around the trend line at random interval (time between cycles is random) that happens due to macro-economic changes such as recession, unemployment, etc. Unlike seasonal patterns, cyclic component has the rise and fall of not the fixed period. Cyclic fluctuations have repetitive patterns with time between repetitions of more than a year (at least 2). Whereas in the case of seasonality, the fluctuations are occurred within a year caused due to the factors that occurs every year. The periodicity of seasonal fluctuations is constant whereas in case of cyclic it is not constant. Here is figure we can see that there is the continuing downfall in the regular period but the frequency is not fixed.

4. Irregular: Irregular fluctuations occurs due to random or unforeseen events. They are of short duration and non-repeating. E.g. corona pandemic, Ebola pandemic. Etc. etc.

5. White Noise: White noise is the random fluctuations of no periods. It has of mean 0 and constant variance and has no correlations. It doesn’t help in predictions. Well it has both advantages and disadvantages. Because of no correlations, it does not help in predictions but sometimes it is added intentionally to reduce other noises, e.g. for reducing traffic noises, Specific sounds might be used to help encourage sleep regardless of environmental noises, such as sound of rain or beach.

Stationary Data (or, Stationarity):

Before applying any statistical model on a Time Series, the series has to be stationary or time invariant, which means that, over different time periods, it should have constant means, constant variance and constant covariance. It means that the data should have constant mean throughout, scattered consistently and should have same frequency throughout. So, if our data mean, variance and covariance is varied with time then our data is non-stationary and we have to make it stationary before applying any method. This is necessary because if our data has some regular pattern then there’s a high probability that over a different interval, it will have same behavior and can cause problem in accuracy of model. And also, mathematical computation for stationary data is easier as compared to that of non-stationary data.

Non-Stationary Series:

Here we have constant variance and covariance but the mean is not constant.

Here we can say that mean is somehow nearly constant but neither variance nor covariance is constant with time.

Here mean and variance is constant but covariance is variable with time.

Stationarity check:

There is two method for stationarity checking:

a) Rolling Statistics — Plot the moving average or moving standard deviation to see if it varies with time. It’s a visual technique.

In above figure we see that standard deviation is constant but mean is uptrending. Hence, it is not stationary.

a) ADCF Test — Augmented Dickey–Fuller test is used to gives us various values that can help in identifying stationarity. The Null hypothesis says that a Time-series is non-stationary. It comprises of a Test Statistics & some critical values for some confidence levels. If the Test statistics is less than the critical values, we can reject the null hypothesis & say that the series is stationary. The ADCF test also gives us a p-value. According to the null hypothesis, lower values of p is better.

How to make a non-stationary time-series stationary?

There is two different method by which a non-stationary time series can be converted to time-series.

Differencing: The p-value (>0.05) indicates that we cannot reject the null hypothesis and hence series is non-stationary. Differencing is performed by subtracting the previous observation from the current observation or we can say, by subtracting previous day demand from current day demand.

By differencing, stationarity can be achieved easily. This means time-series does not depends on time. It’s like white noise, no matter when we observe it looks same at any point of time. Whereas, trends and seasonality affect the time-series at different times. Stationary time-series does not have any predictable pattern.

Decomposition of time-series: decomposition removes the trending and seasonality pattern by decompose any non-stationary time-series into trend, seasonal and some random error (having zero mean and correlated over time). We analyse the random error or irregular pattern as stationary component.

Decomposition model is also of two types:

a) Additive Decomposition:

b) Product Decomposition:

Okay let’s take an example. We will use Air passenger dataset which provides monthly total of US airline passengers from 1949 to 1960 to check for stationarity in time-series data. You can get the dataset and code from Kaggle.

Making the month as index, will take it as X-variable and the total as y-variable.

By plotting the above data, we can see the any trend in data. If there any trend present, we’ll check for stationarity.

From the plot we can see that there is an upward trend. Hence, we’ll go for stationarity check.

Determining Rolling statistics and plotting.

In the above plot we can see the upward trend in the mean, while standard deviation is constant with time. For the series to be stationary, both the mean and standard deviation have to be constant with time, i.e. parallel to x-axis.

Applying ADCF test:

From the result shown in ADCF test, we can see here that p-value is not less than 0.05 and also Test Statistics is not less than any of the critical value. Hence, we do not reject the null hypothesis and therefore the series is non-stationary.

Data Transformation to achieve Stationarity

There are a couple of ways to achieve stationarity through data transformation like taking log10, loge, square, square root, cube, cube root, exponential decay, time shift.

Using time shift (or Differencing) method for stationarity.

It’s always better to define the function for its repeated use rather than writing whole code every time.

Here, our p-value is 0.07 which is still more than 0.05 and Test Statistics value lies in between the Critical value (10%) and Critical Value (5%) and also Rolling Mean and Rolling Standard deviation is constant with time.

The result will be better if we use Exponential Decay method.

Log Scale transformation:

Here we get p-value is 0.02 and test statistics is less than critical value (5%).

Doing decomposition of series:

making residual as decomposed data and checking test_stationarity on it.

Plotting ACF and PACF:

Auto-correlation function (ACF):

Auto-correlation refers to the way the observations in a time series are related to each other. ACF is the coefficient of correlation in time-series between the value of the point at current time and its value at lag k, i.e. correlation between y(t) and y(t-k). ACF identifies the order of MA process.

Partial auto-correlation function (PACF):

PACF is same as of ACF but the intermediate lags between y(t) and y(t-k) are removed (or partial out). i.e. correlation between y(t) and y(t-k) with (k-1) lags is removed.

From the ACF graph, we see that curve touches y=0.0 line at x=2. Thus, from theory, Q = 2 From the PACF graph, we see that curve touches y=0.0 line at x=2. Thus, from theory, P = 2.

Time-Series Model

Moving Average:

Moving Average is the simplest forecasting model in all time-series model. It forecasts the future value of a time-series data using average (or weighted average) of the past data.

Where F(t+1) is the forecasted value at time (t+1), N is the past observations.

In pandas rolling() and mean() functions are used to calculate moving average for a time window (or time period).

Note: simple moving average gives equal weight to all past observations used in forecasting the future value, which is its major drawbacks.

Using the Air Passengers dataset for showing Moving Average forcasting and Exponential Smoothing forcasting.

Dataset contains 144 entries of two variables month and Passenger with no missing value.

Calculating the moving average for the above dataset.

Plotting the actual and predicted value from MA forecasting:

Calculating the Mean Absolute Percentage Error (MAPE):

Mean Absolute Percentage Error is the average of absolute percentage error. It expresses the average error in percentage.

Calculating MAPE and RMSE:

Exponential Smoothing:

Exponential smoothing assigns differential weight to past observations.

Where alpha is the smoothing constant. Its value lies between 0 and 1. More the alpha value is, less the smoothing will be. ews() functions in pandas use to calculate exponential moving average using alpha as the parameter.

Calculating MAPE and plotting the actual and forecasted value:

Forecasting using exponential smoothing has less error (MAPE 9.38%) than the simple moving average model (MAPE 10.88%).

Auto-Regressive Integrated Moving Average Models:

Auto-Regressive (AR) and Moving average (MA), both is used frequently for forecasting. AR and MA are combined to create models such as auto-regressive moving average (ARMA) and auto-regressive integrated moving average (ARIMA). ARMA models are regression models which means regression of a variable on itself measured at different time periods.

From Wikipedia,

The AR part of ARIMA indicates that the evolving variable of interest is regressed on its own lagged (i.e., prior) values. The MA part indicates that the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. The I (for “integrated”) indicates that the data values have been replaced with the difference between their values and the previous values (and this differencing process may have been performed more than once). The purpose of each of these features is to make the model fit the data as well as possible.

Non-seasonal ARIMA models are generally denoted ARIMA(p,d,q) where parameters p, d, and q are non-negative integers, p is the order (number of time lags) of the autoregressive model, d is the degree of differencing (the number of times the data have had past values subtracted), and q is the order of the moving-average model. Seasonal ARIMA models are usually denoted ARIMA(p,d,q)(P,D,Q)m, where m refers to the number of periods in each season, and the uppercase P,D,Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.

Auto-Regressive (AR) models:

Auto-Regressive is a regression of a variable on itself measured at different time points. Auto-Regressive model with lag 1, AR(1,0,0) is given by;

the above equation can be generalised to include p lags on the right-hand-side and is called as AR(p) model.

Where epsilon(t+1) is a sequence of uncontrolled residuals assumed to follow normal distribution with zero mean and constant standard deviation.

AR(p) can be calculated by auto-correlation function (ACF) and partial auto-correlation function (PACF), which for Air passenger case we already calculated it.

Continuing from where we transformed our non-stationary series to stationary series.

Moving Average Model:

A moving average process of lag 1 can be written as:

ARIMA Model:

ARIMA (Auto Regressive Integrated Moving Average) is a combination of 2 models AR (Auto Regressive) & MA (Moving Average). It has 3 hyperparameters — P (auto regressive lags), d (order of differentiation), Q (moving average) which respectively comes from the AR, I & MA components. The AR part is correlation between previous & current time periods. To smooth out the noise, the MA part is used. The I part binds together the AR & MA parts.

RSS value in ARIMA is better than AR and MA models. We will do predictions on ARIMA model and also reconvert the predictions back to original form because we build our model on log transformed dataset.

Predictions: