We come across many instances in real life where we try to forecast what will happen in future(tomorrow,next week,next month,next year or may be in coming years etc).Few common examples are :
1.What will be the stock price after a month?
2.What revenue business will make next year?
3.What will be the air temperature tomorrow? and many more…
If you analyse these questions,one common factor in all the questions is TIME.Thus,when data is recorded on a timely basis It is called as a time series and analysis on this data is known as Time Series Analysis.
As per Wikipedia : A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time and Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data.
Please find sample time series data Here so that you can understand better.
Components of Time Series :
The factors that are responsible for bringing about changes in a time series, also called the components of time series, are as follows:
- Irregular Component : These are sudden changes occurring in a time series which are unlikely to be repeated.
- Cyclic component : These are long term oscillations occurring in a time series
- Trend Component : Trend is the main component of a time series.Trend may show the growth or decline in a time series over a long period.
- Seasonal Component : These are short term movements occurring in data due to seasonal factors.
Time Series Analysis :
Below diagram shows different types of time series analysis(I haven't considered all types of analysis though tried to cover the common ones)
Domain Based Methods:
Frequency domain based methods :I’ll introduce two common methods in Frequency domain
- Spectral Analysis
- Wavelet Analysis
Spectral Analysis :
Spectral Analysis is one of the most widely used methods for time series analysis in geophysics, oceanography, atmospheric science, astronomy, engineering etc.
A time series can be re-expressed/decomposed in terms of sines and cosines,also called as Fourier representation.(You can know more about fourier series here).Different time series would have different coefficients of sine and cosine terms and thus different time series can be compared by comparing coefficients.
This is referred to as Spectral Analysis or Analysis in frequency domain.The frequency domain approach considers regression on sinusoids; the time domain approach considers regression on past values of the time series.
Wavelet Analysis :
Fourier transform(Stated above in spectral analysis) decomposes a time series into its frequency components.Fourier transform will only give information on which frequencies are present but will give no information on when they occur.
The wavelet transformation contains information on both the time location and frequency of a signal.Thus if both time and frequency are important wavelet analysis is done.
Time domain based methods : Two most commonly used time domain methods are :
- Auto correlation
- Cross correlation
Auto Correlation :
One of the basic assumption of Linear Regression(Ordinary Least Squares Procedure)is that the observations of error terms are independent of each other.If this assumption is violated and the error terms are correlated with each other,Auto Correlation exists.Auto correlation is a very common problem in time series regression.
When auto correlation is present,error terms follow a pattern and such patterns tell us that something is wrong. The presence of auto correlation does not mean that the values of one independent variable are correlated over time. Also, it does not mean that independent variables are correlated with each other, as occurs with multicollinearity.
Why Auto correlation is bad for the model?
For each observation, the error term represents the distance between the actual value of the dependent variable and the predicted value. Think of the error term as the model’s “mistake.” These mistakes must not follow a pattern. If there is such a pattern, then there must be some way to improve the model so that the regression does a better job of predicting the dependent variable. A model that exhibits auto correlation can perform better than it is.
There are few methods which check whether auto correlation exists or not,one of which is Durbin-Watson statistic.
Cross Correlation :
If you have two time series x and y,Series y may be related to past lags of x series.Cross correlation helps us in identifying lags of x variables that might help in predicting y variables.
Parameter Based Methods :
Parametric methods : I ll be covering two parametric methods in this blog
- Auto Regression
- Moving Averages
Auto Regression :
Linear Regression model predicts the dependent variable based on linear combination of independent variables.
for example : y = ax + c
where y is the dependent variable and x is independent variable .
Linear Regression can also be used in time series analysis where input variables at previous time steps can be used as one of the features called as lag feature to predict the output variable
For example, we can predict the value for the next time step (t+1) given the observations at the last two time steps (t-1 and t-2). As a regression model, this would look as follows:
X(t+1) = a*X(t-1) +b*X(t-2)
Because the regression model uses data from the same input variable at previous time steps, it is referred to as an auto regression (regression of self).
Moving Averages :
I hope you know what does Average mean.If you dont know you can refer this link. Moving Averages are an extension to average.
Moving averages are calculated by taking arithmetic mean of a given set of values.For example, to calculate a basic 10-day moving average of closing prices you would add up the closing prices from the past 10 days and then divide the result by 10. Some of the primary functions of a moving average are to identify trends.
Non Parametric methods : Two commonly used non parametric methods are :
- Kernel Regression
Above mentioned non parametric methods are out of scope of this blog.
Keep following my blog for more articles/blog posts on Data Science and related topics.Hope you like my blogs.Happy Learning!!!