Time Series Analysis

Tejas Chintala
DataX Journal
Published in
5 min readApr 15, 2020

Do you know how time plays an important factor in Data Science and many other studies?

Time series data is a sequence of information that attaches a time period to each value or metric. Any metric that is measured over regular time intervals forms a time series. It is used to predict future values based on the previously observed values.

Example: Weather data, Stock prices, Industry forecasts, etc.

Data like humidity content, price of the items and number of people in a company varies with time.

Analysis of time series is commercially important because of the industrial need and its relevance, especially when dealing with forecasting (demand, sales, supply, etc).

For analyzing time periods, all the time-periods must be equal and clearly defined, placed in a chronological order which would result in a constant frequency as patterns observed in a time series are expected to persist in the future.

It gives the understanding that the present-day values are dependent on the past values and the future values can be predicted by the present-day values.

Components of Time series

The components of the time series are:

  1. Seasonality: Seasonality can be defined as the variations that repeat over a specific period such as a day, week, month, season, etc., It is always of a fixed and known period.

Eg: A gift shop will experience an increase in sales during the festival season like Christmas. So this increase will be observed in the last week of December for all the years.

2. Trend: Variations that move up or down in a reasonably predictable pattern. It is a movement to relatively higher or lower values over a long period of time. The action appears for some time and then disappears.

Eg: A growing trend in the stock prices.

3. Cyclic: This pattern exists when data exhibit rise and falls that are not of a fixed period. These variations correspond with business or economic ‘boom-bust’ cycles.

Many people confuse cyclic behavior with seasonality. If the fluctuations are not of the fixed period then they are cyclic but if the period is unchanging and associated with some aspect of the calendar, then the pattern is seasonal.

4. Random/Irregular variations: These variations occur due to sudden causes called residual variation due to erratic fluctuations in the data that are difficult to predict.It does not fall under any of the above three classifications.

Eg: Decline in the global economy due to the COVID-19 pandemic.

Stationarity

Stationarity is one of the important factors to be considered when dealing with time series. A time series is said to be in stationary if there is no change in the properties over a period of time. It implies taking consecutive samples of data of the same sizes should have the same covariances and identical distributions.

The conditions for the stationarity are:

  1. The mean of the time series should be constant over time. Hence the trend is nullified.
  2. The variance should also be constant.
  3. Data should have consistent co-variances between the period at an identical distance from one another.

From this plot, we can say that the process is stationary as the mean and variance do not vary with time and the statistical properties are constant. There is see no different pattern that we can observe in the plot as the graph is similar at all ends.

Some of the tests to check stationarity are:

  1. Rolling Statistics: Plot the moving average or moving variance and see how it varies. It is a visual technique.
  2. ADFC(Augmented Dickey-Fuller test): It consists of the Null Hypothesis that is Time Series is non-stationary. The test results comprise of the a Test Statistic and Critical value. If the Test Statistic value is less than the Critical value then the Null Hypothesis is rejected and the time series is stationary.

Autocorrelation

Autocorrelation can be defined as the correlation of a time series with a time lag. It shows that whether previous values of the time series has an influence on the present values or not.

It is commonly used to check if the time series is stationary or not. A stationary time series will have the autocorrelation fall to zero fairly quickly but for a non-stationary series, it drops gradually.

From the plot, we see that the 10th value and the 20th value have a high autocorrelation. Similarly, the 20th value and 30th observations also have similar autocorrelation. Hence, a conclusion can be drawn that we will observe similar values every 10 units of time.

Models of Time Series

Time series can be modeled in many ways in order to get the predictions. Some of the models are:

AutoRegressive (AR) Model

Moving Average (MA) Model

AutoRegressive Model

In time series, we often observe a similarity between past and present values. It is because we encounter a correlation with those data. By knowing the prices of the items today we can predict the prices of tomorrow.

AutoRegressive(AR) model relies on the past period values and the past periods only to predict the current values.

It is a linear model, where the current period values are a sum of past outcomes multiplied by a numeric factor.

It is denoted as :

x(t)=C + ψ*x(t-1)+ε(t) where,

x(t-1) is the values of X during the previous period.

ψ is any numeric constant by which we multiply the lagged variable. (-1<ψ<1) and,

ε(t) is the Residual i.e. the difference between our prediction for the period ‘t’ and the correct value.

Moving Average model

The name of the model comes from the moving averages of fixed periods intervals as one moves through the data set.

In simple words, the model states that the next observations are the mean of all past observations.

These models incorporate past residuals(error terms) to help improve the predictions and estimations. It makes sure that our model handles unexpected fluctuations well, hence it is known as the smoothing model.

The mathematical notation is:

r(t)=c+θ(1)*ε(t-1) +ε(t) where

r(t) is the value of in the current period

θ(1) is the numeric coefficient for the value associated with 1st lag.

ε(t) is the residual for the current period.

ε(t-1) is the residual for the past period.

Conclusion

Therefore, we have seen the components and the models of Time-series analysis and now we know why it is a very important concept in Data Science.

It allows both descriptive and predictive analysis and will continue to play a major role in the industry and the business sectors for planning future operations.

--

--

Tejas Chintala
DataX Journal

CSE Undergrad. Interested in Data Science and Machine Learning.