Introduction To Analytics Modeling : Week 4 — Time Series Models

The Fourth Week’s Notes from Intro To Analytics Modeling! Check out the course on Edx

Udesh Habaraduwa
Udesh’s Data Science Notes
11 min readOct 27, 2019

--

Introduction to Exponential Smoothing

Exponential smoothing is also our first proper introduction to time series data — the same response measured across time : price of a stock, daily temperature, number of red bulls consumed prior to finishing an assignment , etc.

Time series data varies over time. Hidden within time series data could be trends ( upwards or downwards) and possible cyclic variations. For example, we may find a downward trend in bodyweight across the year but a cyclic variation towards the holidays.

As we collect time series data, our data points are subject to random variations as well. For example daily blood sugar readings will most likely look like this — small fluctuations through out the day with spikes around meals.

Fig 1 — Blood glucose levels through out the day

Taking blood glucose levels as an exmaple, let’s define our variables.

Fig 2 — Definitions of variables

Let’s imagine we are doctors to a diabetic patient. We’d be interested in knowing if the baseline estimated blood glucose levels for our patient has increased — has he gotten worse? . However, we’ve also noticed that there is some random variation that happens throughout the day.

If any given observation of blood glucose level is showing an increase from the baseline, we would like to know if this is an actual change in the baseline or simply another one of those random variations.

If our latest observed blood glucose level (X(t)) is indeed a change in baseline, then we need to update our baseline estimate with this new observation — looks like Mr Patient might have been sneaking a few doughnuts in at work that we didn’t know about.

Fig 3 — If it’s an actual change in the baseline, we should update our estimate of the baseline

However, if the current observed blood glucose level is not a change in the baseline, then we keep our baseline estimate the same.

Fig 4 — It’s not a change in baseline so we keep our estimate the same.

Randomness — Single Exponential Smoothing

Exponential smoothing allows us to combine these two extremes such that we can take into account randomness, trends and cycles. Single exponential smoothing uses a factor α to allow the control of the influence new observations have in calculating our baseline estimate.

Fig 5 — Single exponential smoothing allows us to account for random variations in our data

If α is large, approaching 1, the weight given to the previous baseline blood glucose estimate approaches 0 — the weight given to the current reading goes up.

If α is small, approaching 0, the weight given to the previous baseline blood glucose estimate approaches 1 — the weight given to the current reading goes down.

We can use α to account for the random variations that we saw in our patient’s blood glucose levels throughout the day.

Note that to start off, we can set our baseline estimate to the current observed value and adjust as we move forward in time.

Fig 6 — Based on the randomness we can expect in the system, we can adjust alpha

If we expect a lot of randomness in our system, we wouldn’t want every new variation in the observed value to have a large effect on our baseline estimation. In our example, blood glucose level seems to vary considerably throughout the day. By setting α closer to 0, we can dampen the effect these random variations will have on our baseline blood glucose level estimate.

If we expected our patient to be the best behaved person in the history of humanity with perfectly stable blood glucose levels — i.e a robot — then we can set α closer to 1. If we did, we would take every new observation and its accompanying variation as a strong indicator that the baseline has probably changed and update our estimate accordingly.

In a nutshell, what we are doing is deciding how much weight to give new observations when calculating our baseline estimate. The less randomness in the system, the more we can trust the new observation to be close to the actual baseline value and not a random variation. The more randomness in the system, the less we can trust new observations and so we tilt our baseline estimate calculation towards favoring the previous baseline estimate.

Randomness and Trends — Double Exponential Smoothing

Now that we’ve seen how we can account for randomness, we can take it up a notch and consider trends in our data. What good would we be if we couldn’t identify increasing or decreasing trends?

In our example, perhaps we expect to see an increasing trend in his baseline blood glucose levels — very bad news.

To make sure that we’re taking in to account any possible trend, we can add a trend factor T to the single exponential smoothing function.

Fig 7 — We add a trend estimate factor to account for a possible trend in the observations

We update the trend estimate in the same way we updated the baseline estimate.

Fig 8 — Trend estimate at time t

With β, we are adjusting how fast the model learns the trend. If β is closer to 1, we give more weight to the most recent change in baseline in our trend estimate and less to older trend estimates and vice versa.

If we expect the trend to be increasing or decreasing rapidly at each time period, we would set a higher value for β. If we expect the trend to be increasing or decreasing less rapidly, we can take the previous trend estimate into account more. For instance, if we allow our statistical software to learn the value for β on its own, we can gain some insight onto how quickly or slowly the trend is changing based on the β value.

Cyclic Patterns

Sometimes, changes in observed response can be due to cyclic changes.

For example, a patient’s blood glucose may be much higher cyclically during the holidays every year. We need to take this into account when calculating our baseline estimates.

Fig 9 — Multiplicative seasonality

We calculate the seasonality factor as follows:

Fig 10 — Calculating seasonality factor

Here, we use γ to control the weighting between recent observations and previous seasonality factor estimates.

Suppose we have that C = 1.1 for the days 358, 359 and 360 of the year (Christmas Eve, Christmas day and day after). What the model is indicating is that for this time period, the value is higher simply due to seasonality. We are deflating the current observation go bring it down towards the baseline.

Starting Conditions

In our example, had we set the starting value to C = 1, then we would not inflate or deflate any of the glucose readings at that time.

To start, we can set the initial values of C for L lengths = 1 — i.e no seasonal effect. Later, as we gather more information, we can update values of C to account for any cyclic patterns.

Exponential Smoothing — What the name means.

Let’s look at the basic exponential smoothing model but add trend and seasonality which works in the same way.

For example , if we set :

Fig 11 — Effect of alpha on observations

In this example a high X_t gets pulled down by 0.5X_t and a low X_t gets pulled up by (0.5)S_t-1.

In this case a plot of the data may look something like this:

Fig 12 — Single exponential smoothing

So what about the ‘exponential’ part?

Let’s consider again the basic smoothing function:

Fig 13 — Single exponential smoothing allows us to account for random variations in our data

We can rewrite S_t-1 in the same way.

Fig 14 — Calculating baseline for t-1 time periods

We can plug this back into S_t to get:

Fig 15 — We can substitute S_t-1 in terms of X-t, alpha and S_t-2

And we can continue this all the way to the start of our data. What you can see is that as α gets closer to 1, the weight of the previous observations X gets smaller , faster. It also shows that every calculation of the baseline takes into account all previous data points — weighted by our value of α.

What we do with α is control how quickly we want the past values to get smaller — i.e add less to the value of S_t.

Forecasting

We can use exponential smoothing for simple forecasting as well. Let’s take a look at the basic exponential smoothing function again.

Fig 16 — Single exponential smoothing allows us to account for random variations in our data

For a forecast, we are looking for S_t+1 — one time period after the current one.

Fig 17 — Forecast for the next time period

Since we don’t know X_t+1, we can use our best guess for what that might be which is that the observation for the next time period will be our baseline estimate.

We can write this as our forecast ‘F’ as follows:

Fig 18 — Since we don’t know the observation at time t+1, we can use our best estimate which is S_t — our baseline

As we move forward into the future, the same pattern holds.

Fig 19 — Forecast as we move forward into K time periods

However, the further into the future we look, the higher the uncertainty and thus, higher forecast error.

We can include trend in the forecast in a similar fashion. Recall the function for estimating baseline including additive trend:

Fig 20 — We add a trend estimate factor to account for a possible trend in the observations

Where the trend estimate is as follows. Notice how we are using t-1, the previous time period trend in estimating the current trend value.

Fig 21— Trend estimate at time t

Therefore, a forecast with t+1, moves all the time periods up by one to give :

Fig 22 — Updating the trend and baseline estimate equation to forecast. Notice that X_t+1 = S_t

Given that our best estimate for the baseline in the next time period S_t+1 is the current baseline estimate S_t, the trend formula simplifies down to:

Fig 23 — Our best estimate of the trend for the next time period is our current trend estimate

Therefore our forecast with additive trend becomes:

Fig 24 — Forecast at time t+k with additive trend component

We can include multiplicative seasonality in our forecasts as well. Simplifying the equation using X_t+1 = S_t, we arrive at the following.

Fig 25 — Best seasonality factor forecast

Which is to say, our best cyclic factor forecast is the best estimate we have at the same time in the previous cycle.

Therefore, forecast with trend and seasonality for all future times K can be defined as:

Fig 26 — Forecast with additive trend and multiplicative seasonality

ARIMA — Autoregressive Integrated Moving Average

ARIMA consists of 3 parts.

Part 1 — Differences

Let’s recall the basic exponential smoothing function.

Fig 27 — Single exponential smoothing allows us to account for random variations in our data

This function works by attempting to estimate a baseline based on the previous and current values of X.

It works well if the data is stationary. In a stationary process the mean, variance and other measures are all expected to be constant over time. However, even if the data is not stationary, the differences in the data might be.

For example, we could try different levels of differences all the way down to D^th order differences.

Fig 28 — D^th order differences of the data maybe stationary even if the data itself isn’t.

Part 2 — Autoregression

Autoregression is predicting the current value based on previous time periods’ values.

Regression is predicting a value based on other factors. For example, we may want to predict sales for toys based on the number of new births in an area. “Auto” regression is the process of using only earlier values of the thing we are trying to predict, like sales over the past 5 years of toys in our example, and only works with time series data.

Recall the expanded form of the basic exponential smoothing function.

Fig 29— We can substitute S_t-1 in terms of X-t, alpha and S_t-2

This is a form of autoregression which uses data all the way back: an order — ထ autoregressive model. As we can see, all the previous baseline estimates are baked into the function as we move further back in time periods.

We could also limit how far back we go by stoping at ‘P’ time periods in the past : an order-p autoregressive model.

ARIMA combines these approaches by running autoregression on the differences. Using time periods ‘P’ of previous observations , it tries to predict d^th order differences.

PART 3 — Moving Average

Here we use previous errors as predictors.

Fig 30 — Error for predictions at time t

and we can control how far back we want to look by setting time periods ‘q’ for an order — q moving average.

Combining all of these parts gives us : ARIMA(p,d,q) model

Fig 31 — ARIMA p,d,q model

And so, using statistical software, we can try different values of p, d and q to get the best constants. We can also add seasonality if we wish. Specific values of p, d, q can be used for different circumstances.

ARIMA(0,0,0) — White noise. No patterns in the data

ARIMA(0,1,0) — Random Walk . A stochastic or random process

ARIMA(P,0,0) — AR(autoregressive). Only the auto regressive part is active

ARIMA(0,0,q) — MA(Moving Average). Only the moving average is active

ARIMA(0,1,1) — Basic exponential smoothing

ARIMA is better for short term forecasting than basic exponential smoothing when data is more stable (i.e fewer valleys and peaks) and best to have atleast 40 data points.

GARCH — Generalized Autoregressive Conditional Heteroskedasticity

GARCH is a method of estimating or forecasting variance for time series data — i.e a forecast of how things might change over time.

Knowing the variance can help us get an idea of the amount of error we might make in our estimation.

Suppose we want to forecast sales at a restaurant so that we can make purchases of our ingredients. Knowing how much we might be off in our estimation of sales will let us plan more effectively. Thus, we can leave some room for error and perhaps buy a little more than we actually need, just in case.

Another example is estimating ROI. Variance in the estimation, i.e how much higher or lower than the estimation the actual return might be, can be used as a proxy for the investment’s volatility. Hence, if we want a lower risk investment, we can choose an investment with lower variance.

fig 32 — GARCH(P,Q) model

Note that unlike ARIMA, GARCH uses variances (squared errors) and raw variations (not differences).

--

--

Udesh Habaraduwa
Udesh’s Data Science Notes

There is no enduring good. Except, perhaps, the enduring search for it.