Getting Started with Time Series

Mehul Gupta
Data Science in your pocket
4 min readJun 14, 2019

--

Time Series Analysis and Forecasting is one of the most important techniques in predictive analytics. Most data scientists overlook it in their projects either because they think it is not important or because they believe it is too complicated to use. However, it is an extremely useful technique and can be fairly easy to understand too. Before starting, let’s understand what is a Time Series.

What is a Time Series?

Time Series is sequential data that is indexed on timestamps. Common examples include predicting the stock market, forecasting the temperature. In these examples, the order of data is important and is useful in making predictions for the future.

Datasets like Kaggle's Titanic, House Price Prediction, etc. aren’t time series as the order of your data(samples) isn’t important. Results remain the same even if the order is changed for Titanic data, but it is not true for Time Series. You should arrange your series in order before proceeding!

Common Terminologies to know

1) Rolling/Moving/Running-

It refers to performing any operation (like sum, average, or standard deviation) using a sliding window over the entire data.

Example — If the given data is [1, 2, 3, 4, 5, 6, 7] & window defined is 3, then the rolling average would be [NaN, NaN, 2, 3, 4, 5, 6]. The first 2 values aren’t cannot complete a window & hence the NaN value.

2) Lagged Features -

It is generally a feature engineering technique where new features are generated by shifting given data by t-1, t-2, t-3, etc.

3) AutoCovariance -

It refers to the Covariance of a given data with the lagged versions of itself.

Example — If we need to calculate AutoCovariance with the 5th lagged version, we need to shift our data by 5 places i.e. if the starting date for your series is 6th June, then a lag by five will have starting date of 1st June corresponding to 6th date of the unlagged version.

4) Partial AutoCovariance -

It refers to the Covariance of given data with lagged versions of itself BUT after removing covariance due to other smaller lags.

Example — For calculating the Partial AutoCorrelation between the data & its Kᵗʰ lagged version, we eliminate the covariance effect due to K+1ᵗʰ, K+2ᵗʰ lags of the version on Kᵗʰ lag.

5) Stationarity of Time Series -

A time series is stationary if the mean, standard deviation & autocovariance remain constant. This means that there shouldn’t be frequent fluctuation when you use a rolling window for calculating the mean/ standard deviation/autocovariance.

6) Trends -

A gradual increase/decrease in data values as time passes starting from any point in time.

Example — A sudden increase in TRP of a channel due to some celebrity episodes.

7) Seasonality -

When a trend repeats itself periodically from time to time.

Example — High sales in the month of May every year can be taken as seasonality.

8) Resampling -

Converting your data to either a higher frequency or lower frequency.

Example — Converting monthly data to day-wise data or day-wise data to monthly data.

9) White Noise -

It refers to data with 0 mean but random/independent in nature and has 0 correlation with other values in the series. If your time series is White Noise, you can’t make predictions.

10) Stochastic process -

A process that yields different results on applying the same equation i.e. a random process.

11) Random Walk -

This process aims at forecasting future values using the equation.

Y(t)=Y(t-1) + e

where e is a white noise term that is random for every prediction & hence random walk.

12) Random walk with Drift -

It is similar to a random walk with a slight change. Here the values are predicted using the equation

Y(t)=Y(t-1) + D +e

where D is the Drift constant, and e is the white noise term.

13) Additive models -

These are the models that consider that the data has been generated using the following equation:

Data = Seasonality + Trend + Residuals

where residuals are the values left when the trend & seasonality have been removed from any time series

14) Multiplicative models -

These models consider that data has been generated using the following equation:

Data = Seasonality X Trend X Residuals

This much might be enough for the day. Do follow up for my next article on creating your first Time Series solution using ARIMA.

Below you can check out different models for univariate time series forecasting

--

--