The Sheer Beauty of Time Series Analysis…

Hrithik Rai Saxena
12 min readAug 27, 2023

--

Hey there traveler, welcome to another blog. This time let me introduce you to time series analysis. If you are working in the field of Data Science, I bet you have come across this domain at least once. Now to be honest, I had some beef with this topic. Why? Just look at this wiggly-woggly thing below. It was enough to give me the heebie-jeebies. Then I thought, if I can sit through a movie like the Human Centipede, what is stopping me from giving this domain a shot. So, I started from scratch. As I moved forward, I saw different mathematical and statistical components working in unison to deal with this kind of data. This inspired me to write a clear, no formula heavy blog that can help everyone understand, how beautiful time series analysis really is. So just sit back and relax and have a good read.

Revenue generated per day by a product

If I ask you to forecast the above time series further using your current knowledge of machine learning, I bet you are going to scratch your head a little. Time series analysis needs to be dealt differently than what you have encountered before. Forget about using only regression algorithms here. They will fail miserably. Also, there are a lot of models out there that people use for tsa(time series analysis) like ARMA,ARIMA,SARIMA,SARIMAX,PROPHET…. the list goes on and on. But still for most, these models are like a blackbox. They just throw in the time series, tune parameters using brute force and get the forecast. Well this works in most cases but why treat it like a blackbox when in reality it’s a box full of rainbows :)

So let’s try to understand the intuition behind tsa, so that anytime you face a problem where time series is involved: You solve it with a happy face.

The Intuition

Let’s start with regression. You might wanna treat this problem as a regression analysis owing to the continuous nature of the data points. But but but…REGRESSION IS MEANT FOR PREDICTING WITHIN SOME BOUNDED RANGE. Consider the plot below.

Here if we consider a case of simple linear regression, for a given temperature within the bounded range [0F,100F], one can predict the number of ice cream cones sold. This is called Interpolation.

But when it comes to tsa, we need to forecast beyond the bounded time range on which we trained our model. This is called Extrapolation.

Now we are standing beyond the bounds with the question, what challenges does extrapolation bring?

The answer is in the intrinsic nature of time series analysis, we need to predict an observation from the lagged version of that observation. This means that the current value is dependent on its past value (Imagine a stock market…the current price of the stock is lurking somewhere around its past value and then due to the volatility, the fluctuations are introduced).

Now there is always some kind of uncertainty during extrapolation. The further we go away from our range, the less sure we are. Uncertainties pile up as we keep on extrapolating because we have some uncertainty from data as well as uncertainty from the last predicted observation.

So let’s declare this topic as one big uncertain cluster**** and move on to the next blog…XD

But wait, we have the option to decompose the time series and look what’s inside.

By breaking down a time series, you find signal along with some good old noise. Now that you have a signal, it means it contains some information. That information is what we need to train our model, so that it can forecast the ts further.

Time for stats…

Now that we have a signal we need to look for certain characteristics that will help us understand it. Keep an eye on, whether it is stationary(Not all models needs the time series to be stationary such as Holt-Winters or Facebook Prophet). Stationarity simply means that the mean of the time series and standard deviation around its curve is constant, the autocorrelation with time is constant and there is no seasonality and yeah, also no trend. LOL…bombarded you with a lot of stupid conditions. We will cover each in detail when the time comes but for now let’s understand why constant mean and variance.

Ok, so this is the part where we need to talk about from where do we get the power to predict a wiggly-woggly series. It’s regression of course. In traditional machine learning we work under the assurance that the test set will have similar statistical distribution as the train set. Because we need to fit a function such that it can adjust with a weight that predicts the dependent variable within some satisfactory bounds. It fails if you throw a out of sample data point. Similarly in time series our current value is a lagged version of it’s future value. Their statistical distribution around mean and standard deviation have to be similar to avoid potential shocks.

If a time series is stationary, it will be mean reverting, it means that if a sudden outlier introduces some shock, our time series will recover over time and revert back towards mean. That is why dealing with outliers is considered holy in time series analysis. They have the power to alter the path of our analysis.

Now I really want to throw in some math here but I promised you to sit back and relax.

Moving on we have to look whether is contains a trend(deterministic,stochastic), seasonality(a deterministic cyclic component in a time series with a fixed and known frequency) or cycles(redundancy in patterns but along a long uncertain span of time).

Now you can understand why we need to get rid of this stuff. Because we need our statistical distribution stable for predicting further. These three factors have the power to shift our statistical distribution to a level, our regression bit will fail to predict.

Now you must be familiar with the term moving average. In stock market analysis these indicators are still loved by many. So along with the power of regression we decided to fuel up our game by adding moving averages. It simply means that our average is moving…

No i mean, like think about it. The time series is hovering around an average value. Now I make an adjustment in the mean value and forecast the next value. After comparison with the actual value, we will get the error. Now this error will guide us. We will take a coefficient of our choice and multiply that with the error. We will now add or subtract this adjusted error from our last prediction and move on. This repeats and we keep on changing the values of the coefficients until we can match something with the test set. This is basically how ARMA models work. They use the power of auto-regression and moving averages togeather. Currently we are way ahead of ARMA but still everything evolved from around here and it’s good to get the basics right. Now I think you are a little comfortable with the notion behind time series. Try looking at the first graph now with the ideas of signal, noise, stable statistical distribution, trend, seasonality and cycle in mind. Don’t try to bring up solution, just ponder around breaking it down to its components. Now it’s time to dive deeper in order to really understand what’s going on.

Before we discuss how to model a time series, we need to discuss the development cycle of the tsa. Firstly, we need data along with some validation checks. Now the data points needs to be continuous in nature. If there is something funny like outliers or missing data points, they can potentially lead to shocks and disturb our continuity. Here we need to add a data pre-processing level which imputes the missing data and deals with the outliers. A good idea to impute data is either to consider global mean, local mean or mean from similar instances around that time stamp. Secondly, you need to deal with trend and seasonality. Here comes differencing. Again, it simply means that if we assume the trend to be hovering around a linear function, we can subtract the original series with it’s lagged version. This helps us turn that trend linearity parallel to the x axis. Now if we consider a lag of one that means to subtract the current time step with the previous one, it’s called first order differencing.

Notice how after differencing, the trending regions of the first plot have stabilized. Also the mean seems to be shifted to zero. This also helps us get rid of seasonality. First-order differencing is not always the answer for non linear trends but still you get the intuition. This is the kind of work environment, where the ARMA wants to operate. Now seasons and cycles have to be eliminated since they are inducing redundancy, the data points are flowing through a particular pattern after a certain period, which also breaks the continuity.

Transformations:

Thirdly, it’s not about analyzing the time series the way it is being presented to us. The target is to transform the signal in a way that is smooth enough to extrapolate it within some satisfactory bounded range. Imagine you are working with monthly sales data. When you perform aggregation, you sum up the daily sales and try to play around with it. The problem here arises when we have different days in every month and also the leap year problem. This adds unnecessary noise to the signal. The best way is to average the values for every month. This way the uneven days in month problem is solved and you get a nice smooth curve as shown below. Similarly depending on the situation one can perform log or exponential transformations as well.

Coming to the fun part…

Checking Autocorrelation and Partial Autocorrelation…

An important feature of time series is their potential serial correlation. We aim to find out correlation of the time series with its own lags. A lag is introduced by shifting the time dependent variable by one lag at each time step as shown below.

Now each lagged column is treated as a separate external variable. Let’s see autocorrelation first. Autocorrelation between two random variables can be shown below. This is a dimensionless measure for the linear association between the two random variables.

Now let me introduce the correlogram, the standard means of visualization for the ACF. It has become a widely accepted standard to use vertical spikes for displaying the estimated autocorrelations.

These vertical spikes represents the correlation between the series and its own lags starting from lag 0. First line is always set to one since its correlation of the time series with the same time series. It’s best to ignore the first line and count from the second line. Similarly, the third line will represent the corr(y,y with 3 lags). Note that we are also taking the effects of the lag 1 and 2 in it. The estimated autocorrelations will generally not be zero. Hopefully, they will be small, but the question is how much they can differ from zero just by chance. Here comes the blue dashed region to our rescue. For any stationary time series, those vertical lines that fall within the confidence band are considered to be different from 0, only by chance, while those outside the confidence band are considered to be truly different from 0 . The statistics behind this involves plug in estimations, Normal distributions and hypothesis testing which is not important at this moment.

Estimation of the ACF from an observed time series assumes that the underlying process is stationary. Only then we can treat pairs of observations at lag k as being probabilisticly “equal” and compute sample covariance coefficients. Hence, while stationarity is at the root of ACF estimation, we can of course still apply the acf to non-stationary series. This can help us check stationarity of a time series and possible involvement of trend and seasonality or both. We will cover this later in the practical implementation of time series analysis. Here we only need to get the grasp on how to develop the thinking process while dealing with tsa.

Coming to Partial Autocorrelation, we try to compute the individual effects of lags on the original time series.

Here the third vertical line in the pacf plot, will represent the correlation of the third lag on the original time series while omitting the influences from lag 1 and 2. Underlying, a linear regression model tries to fit y as a function of lagged versions and the coefficient from the third lag in this case, will be the pacf value on plot. It is a good practice to set the y in both plots within -1 to +1 for hassel free analysis.

Damn, that was a lot, but if you have made it to this point, my respect is with you.

Now that you have the knowledge of auto-regression and moving averages along with acf and pacf while dealing with trends,seasons and cycles. You can understand what is happening under the hood. All the maths and stats that is being used in tsa is basically for achieving the best setting which spits out the desirable results. Don’t worry everything will be much clear when we will actually see all this in action, but I promised you to keep this blog math and stat free. In a later blog we will code a SARIMAX model and understand everything.

In the end the only thing left is model diagnostics. First we need to see if there is any pattern left in the residuals. If there is, that means we have failed to capture some dynamics in the data. Also a good idea can be to plot a separate correlogram for the residuals and see whether few vertical lines cross the blue region, if they do cross, we need better predictors. Also during cross validation, you need to ensure that the test set is not too big. The reason is that during traditional machine learning, suppose we do a 70–30 split due to interpolation we know the whole of test set will work but in time series analysis if you do a 70–30 split, owing to the nature of extrapolation, the forecast comparison with the first half of the 30 split might resemble but after some time, it will go rogue and the residuals will explode. In tsa, a 90–10 split might be a good idea. To measure the performance one can perform MLE estimation(Maximum likelihood estimation) or AIC(Akaike Information Criterion). Also it’s a good idea to provide a prediction interval with certain confidence during forecast. This way you know the upper and lower bounds for your forecasted time series.

There are a lot of things that I wanted to cover like additive and multiplicative models, invertibility of a time series, white noise, lag operators, rolling forecasts, setting prediction intervals and various checks for stationarity and unit roots…but again, we are not here to build a model. I just wanted you to have a bird’s eye view of time series analysis. Now you have built a thought process for tsa, from data validation to preprocessing to how we model our time series for forecasting, alongside evaluation and diagnostics. In the coming blog I will be building a SARIMAX model with real life raw data that you can usually expect from the market. It will cover the python code alongwith the underlying maths and statistics.

Till then, Happy Learning…

--

--

Hrithik Rai Saxena

Hey there, I'm a machine learning engineer based in Germany.