A brief overview of the different aspects to look for when doing time series analysis.
Time series is statistical data that is arranged and presented in chronological order. Time Series Analysis is predicting the data for the future based on the past data in the time series. For e.g. If we know the sales of an organization from 2015 to 2020, we can use this data to predict the sales for 2021 and beyond. But bear in mind that this is only the prediction and not exact number. Its a prediction stating that sales will be somewhere near to this number. Couple of things to keep in mind while doing time series analysis:
- The readings OR observations are usually taken at equal intervals (daily, weekly, monthly etc.)
- The observations are taken for the same variable over a period of time. Like “Sales of a company over a period of time”. Here ‘Sales’ is the variable.
Components of Time Series
Variations in time series can happen over a period of time. These variations are broadly categorized as components of time series. They are:
- Long term variations OR Simple Trend
- Seasonal variations
- Cyclic variations
- Random OR irregular variations
Long term variations OR simple trend is very easy to understand. Taking sales again as an example in this context, the sales can either increase, decrease OR remain stagnant over a period of time. The below figure depicts the simple trend.
Some other examples of simple trend can be population growth, birth rate, death rate etc.
Seasonal variation is when we see ups and downs in growth within the same year. As an example, the sales of ACs and coolers will increase during summer but decrease drastically during other seasons. Its important to note that the variations are within the same year.
Cyclic variation is when you see a cyclical pattern in the data over a period of time. The classic example being stock market where you see recession and recovery happening continuously over a period of time. The below diagram gives a clear depiction of the cyclic variation.
Random OR irregular variation can happen due unknown OR unpredictable circumstances. The best example in these times is COVID. A lot of businesses suffered due to COVID and lock downs. But this is a not a phenomena which will repeat over a period of time.
Models of Time Series Analysis
Model of time series analysis helps us to decide how to predict values based on the above factors of trend, seasonality, cyclic and irregular. There are 2 type of models:
- Additive model
- Multiplicative model
Additive model: If the four factors (trend, seasonality, cyclic and irregular) are completely independent, then we use the additive model to predict the future values. When I say independent, one factor does not influence OR affect the behavior of other factor. With this assumption, magnitude of time series is the sum of separate influences of the four factors.
Where T, S, C and I stand for trend(T), seasonality(S), cyclic(C) and irregular(I).
Multiplicative model: In multiplicative model, it is assumed that all the four factors are inter-dependent. For example, a war can makes the growth irregular which can lead to depression and the trend might slow down for a longer period etc. So if the factors are inter-dependent, you use the multiplicative model.
Measurement of Simple Trend
Now that we understand the basics, lets understand where we can predict. Some questions to probe here:
- Can we predict cold drinks sales in winter given summer data? (SEASONALITY)
- Can we find out what will be sales figure if there is a natural calamity? (IRREGULAR)
- Can we predict recovery based on recession? (CYCLIC)
Well, the answer to all the above will be NO. So the only way to predict OR forecast is in case of Long term variations OR simple trend. The commonly used methods for measuring the trend in a time series are:
- Free hand curve
- Moving averages
- Least Squares
Free hand curve is pretty simple and straight forward and easy to understand. In this method, you simply plot the curve based on the available data points and extend the trend line to forecast OR predict for future. In the picture below, s5 is the forecast of sales at time t5 based on historical data using free hand curve method. The problems with this approach is that every person can come up with different trend line, making it inefficient to depend on. Another point is that if your data set is small, your curve may not be good and making it completely unpredictable and uncertain.
Moving average works on the basis of calculating the average (arithmetic mean) of a fixed number of readings OR observations over a period (3 years, 4 years etc.). It moves through the series by dropping the first reading of previously averaged group and taking the next one in the series. The average value calculated is considered trend value of the unit of time at the center of period. Note that in case of even number of readings, the trend value lies b/w 2 intervals which is again centered as a second step. Also note that in case of 3 year moving average, we are leaving out first and last year, whereas in case of 4 year moving average, we are leaving out first and last 2 years. The below image give a clear depiction of 3 and 4 year moving averages of sales.
Least Squares: This method uses regression analysis to find the trend line for the time series data. The regression trend line Y is defined below:
To get the values of a and b, we use the below formula:
where n is the number of years/months OR in general periods.
Let us understand this with a simple example. We are considering sales data for 5 years. The value of x is taken as zero for mid-point (in below example, year 2019). The values above mid-point will have negative values for x and the values below mid-point will have positive values for x.
With the data available from above table, we can calculate the values of a and b as below:
Now, if we want to estimate the sales for the year 2022, we can simply replace the values in the regression equation as below (NOTE: value of x will be 3 for the year 2022 considering the above table)
So the sales prediction for the year 2022 is 107.6 with least squares method. Note that in case of even number of observations, the mid-point will be 2 periods instead of one. In such a case, the values of mid-points (x) will be -1 and +1. Also, the values above mid-point are incremented by 2 points (-3, -5 and so on) and values below mid-point will be +3, +5 and so on.
White Noise and Time Series
A time series is considered as white noise if the mean is zero and variance is constant. In such cases, doing time series analysis does not make any sense and we should stop doing it. The best forecast for the white noise series is the average of the series.
Stationarity of a Time Series
A time series is said to be stationary if the mean and variance are the same for any given time period. To be precise, mean and variance are time invariant. All time series algorithms need the time series to be stationary. If its not stationary, your results will be erroneous. Although in reality, the data will not be stationary. There are methods available to make the data stationary. One such method is differentiating (I will not cover this here since this is becoming too long a read).
Approach to Time Series Analysis
Now that all the jargon is clear, here is how one should approach time series analysis:
- Identify if the series is white noise. If it is, stop doing time series analysis.
- Check if the series is stationary. If it is not, make the data stationary before starting with time series analysis
- Once its clear that series is not white noise and it is stationary, you can start using the series to apply time series algorithms.
I will soon come up with one more article with practical implementation of time series using one of the algorithms.