Working with Time Series in Python: Exploring Exponential Smoothing and ARIMA
TL;DR: In this article you’ll learn the basics steps to performing time-series analysis and concepts like trend, stationarity, moving averages, etc. You’ll also explore exponential smoothing methods, and learn how to fit an ARIMA model on non-stationary data.
Time series are everywhere
Situation 1: You are responsible for a pizza delivery center and you want to know if your sales follow a particular pattern because you feel that every Saturday evening there is a increase in the number of your orders…
Situation 2: Your company is selling a product and you are in charge of predicting, or forecasting, the supplies needed for this product at a particular moment in the future…
Situation 3: You are monitoring a datacenter and you want to detect any anomaly such as an abnormal CPU usage which might cause a downtime on your servers. You follow the curve of the CPU usage and want to know when an anomaly occurs…
In each of these situations, you are dealing with time series. Analyzing series is a fascinating job because despite all mathematical models (including neural networks), we humans still fail to predict the future and have to deal with uncertainty. Let’s have a closer look at what time series are and which methods can be used to analyze them. In this article, we will extensively rely on the statsmodels library written in Python.
A time series is a data sequence ordered (or indexed) by time. It is discrete, and the the interval between each point is constant.
Properties and types of series
Trend : A long-term increase or decrease in the data. This can be seen as a slope (is doesn’t have to be linear) roughly going through the data.
Seasonality : A time series is said to be seasonal when it is affected by seasonal factors (hour of day, week, month, year, etc.). Seasonality can be observed with nice cyclical patterns of fixed frequency.
Cyclicity : A cycle occurs when the data exhibits rises and falls that are not of a fixed frequency. These fluctuations are usually due to economic conditions, and are often related to the “business cycle”. The duration of these fluctuations is usually at least 2 years.
Residuals : Each time series can be decomposed in two parts:
- A forecast, made up of one or several forecasted values
- Residuals. They are the difference between an observation and its predicted value at each time step. Remember that
Value of series at time t = Predicted value at time t + Residual at time t
Decomposition of a time series
Each time series can be thought as a mix between several parts :
- A trend (upward or downwards movement of the curve on the long term)
- A seasonal component
- Residuals
Here is what our time series looks like when broken down into the above components:
Our time series shows a clear upwards trend. It is not seasonal, as you can see that the seasonal component looks quite ugly. The residuals’ variance seems to increase through time, showing that the series exhibits more random behavior at the end.
The golden rule: Stationarity
Before going any further into our analysis, our series has to be made stationary.
Stationarity is the property of exhibiting constant statistical properties (mean, variance, autocorrelation, etc.). If the mean of a time-series increases over time, then it’s not stationary.
Transforms used for stationarizing data:
- De-trending : We remove the underlying trend in the series. This can be done in several ways, depending on the nature of data :
- Indexed data: data measured in currencies are linked to a price index or related to inflation. Dividing the series by this index (ie deflating) element-wise is therefore the solution to de-trend the data.
- Non-indexed data: is it necessary to estimate if the trend is constant, linear or exponential. The first two cases are easy, for the last one it is necessary to estimate a growth rate (inflation or deflation) and apply the same method as for indexed data. - Differencing : Seasonal or cyclical patterns can be removed by substracting periodical values. If the data is 12-month seasonal, substracting the series with a 12-lag difference series will give a “flatter” series
- Logging : in the case where the compound rate in the trend is not due to a price index (ie the series is not measured in a currency), logging can help linearize a series with an exponential trend (recall that log(exp(x)) = x). It does not remove an eventual trend whatsoever, unlike deflation.
Checking Stationarity
Plotting rolling statistics
Plotting rolling means and variances is a first good way to visually inspect our series. If the rolling statistics exhibit a clear trend (upwards or downwards) and show varying variance (increasing or decreasing amplitude), then you might conclude that the series is very likely not to be stationary.
Augmented Dickey-Fuller Test
This test is used to assess whether or not a time-series is stationary. Without getting into too much details about hypothesis testing, you should know that this test will give a result called a “test-statistic”, based on which you can say, with different levels (or percentage) of confidence, if the time-series is stationary or not.
KPSS
The KPSS (Kwiatkowski-Phillips-Schmidt-Shin) test tests for the null hypothesis that the series is trend stationary. In other words, if the p-value of the test statistic is below the X% confidence threshold, this means we can reject this hypothesis and that the series is not trend-stationary with X% confidence. A p-value higher than the threshold will lead us to accept this hypothesis and conclude that the series is trend-stationary.
Autocorrelation plots (ACF & PACF)
An autocorrelation (ACF) plot represents the autocorrelation of the series with lags of itself.
A partial autocorrelation (PACF) plot represents the amount of correlation between a series and a lag of itself that is not explained by correlations at all lower-order lags.
Ideally, we want no correlation between the series and lags of itself. Graphically speaking, we would like all the spikes to fall in the blue region.
As we can see, there are several spikes above the blue region, meaning there are correlations at lags 1, 2, 3 and 4.
Choosing a model
Exponential smoothings methods are appropriate for non-stationary data (ie data with a trend and seasonal data).
ARIMA models should be used on stationary data only. One should therefore remove the trend of the data (via deflating or logging), and then look at the differenced series.
Smoothing methods
Smoothing methods work as weighted averages. Forecasts are weighted averages of past observations. The weights can be uniform (this is a moving average), or following an exponential decay — this means giving more weight to recent observations and less weight to old observations. More advanced methods include other parts in the forecast, like seasonal components and trend components.
We will use the component form for our mathematical equations. y will denote our time series, p our forecast, l the level, s the seasonal component and b the trend component
Simple Exponential Smoothing
>When to use?
Few data points, Irregular data, No seasonality or trend.
>Math behind
Just keep in mind that SES only has one component called level (with a smoothing parameter denoted as “alpha” below). It is a weighted average of the previous level and the current observation:
Holt’s Linear Smoothing
>When to use?
Trend in data, No seasonality.
Math behind
The forecast is made of a level component and a trend component. :
Holt’s Damped Trend
The problem with Holt’s Linear trend method is that the trend is constant in the future, increasing or decreasing indefinitely. For long forecast horizons, this can be problematic. The damped trend method is therefore a method which add a dampening parameter so that the trend converges to a constant value in the future (it flattens the trend). The parameter 𝑏 is replaced by 𝜙𝑏
>When to use?
Data has a trend. Use the multiplicative version, unless the data has been logged before. In this case, use the additive version
Notice how the green curve flattens at the end of the forecast
Holt-Winter’s Seasonal smoothing
This will be covered in the next article. But just so you know, this method is like Holt’s Linear Smoothing, except we’ve added a seasonal component !
ARIMA
ARIMA models (which include ARMA, AR and MA models) are a general class of models to forecast stationary time series. ARIMA models are made of three parts:
- A weighted sum of lagged values of the series (Auto-regressive (AR) part)
- A weighted sum of lagged forecasted errors of the series (Moving-average (MA) part)
- A difference of the time series (Integrated (I) part)
An ARIMA model is often noted ARIMA(p, d, q) where p represents the order of the AR part, d the order of differencing (“I” part), and q the order of the MA term.
1) Choosing the differencing order
The first step of fitting an ARIMA model is to determine the differencing order to stationarize the series. To do that, we look at the ACF and PACF plots, and keep in mind these two rules:
“ — Rule 1 : If the series has positive autocorrelations out to a high number of lags, then it probably needs a higher order of differencing.
— Rule 2 : If the lag-1 autocorrelation is zero or negative, or the autocorrelations are all small and patternless, then the series does not need a higher order of differencing. If the lag-1 autocorrelation is -0.5 or more negative, the series may be overdifferenced.”
(Robert Nau, Statistical Forecasting)
We start by logging the data as the raw data exhibits an exponential trend:
The logged series seems to be more flat, but is it stationary ? Let’s compute a KPSS test to check this :
The test statistic is above the critical values, we reject the null hypothesis, our series is not trend stationary. However, in order to keep this article short, we will continue as if it were.
( — To have a trend-stationary series, we could for instance think of linearly regressing our logged series, and divide our series by the coefficient of the regression… — )
Let’s now look at the ACF plots for the logged-first-difference data:
The “ACF — Logged data” chart shows non-stationary data, characterized by the slow linear decay in the spikes (cf rule 1 above). Adding a 1st order difference gives a single negative spike at lag value 1. According to rule#2, we don’t need to differentiate the series any further. Let’s check our results by comparing a (0, 0, 0) and a (0, 1, 0) ARIMA model:
The Akaike Information Criterion (AIC) is lower with the ARIMA(0,1,0), meaning this model is performing better than the ARIMA(0,0,0). Let’s have a look at the residuals and check their variance:
Which residuals do you think look better ?
2) Choosing the MA order
Now we know we have to include a 1st order difference in our model, we need to choose the Moving-Average order. This is done by looking at the differenced series (because we just saw that the first-order difference series was stationary). Again, we look at our ACF and PACF plots, with this rule in mind:
“ If the lag-1 autocorrelation of the differenced series ACF is negative, and/or there is a sharp cutoff, then choose a MA order of 1”.
Question : why choosing an MA order of 1, and not higher ? Well it’s because we will proceed step by step. If we observe autocorrelations at higher lags, and by looking at the autocorrelation residuals of our (1,1,0) model, we still observe these spikes, we can increase our MA order, though is usually not recommended to go beyond 2!
Notice how the AIC has dropped again, and how the residuals variance decreased. That’s a sign our (1,1,0) ARIMA is performing better than the (0,1,0) model !
3) Choosing the AR order
Now you might wonder: Do we have to add an AR term at all ? The answer is no. In fact, you should add an AR term in the case where :
“ If the lag-1 autocorrelation of the differenced series PACF is negative, and/or there is a sharp cutoff, then choose a AR order of 1”.
In our case, we observe negative lag-1 autocorrelations in both the ACF and the PACF. Notice that the PACF of the differenced series shows two negative spikes at lags 1 and 2, meaning we could in theory push our AR order up to 2.
Here is what we get by fitting a (1,1,1) ARIMA:
Adding an AR term has lowered the AIC and the variance went from 0.155 to 0.145, we’re doing good ! Should we add another AR term and pick a (2,1,1) ARIMA? Let’s have a look at the ACF & PACF of the ARIMA(1,1,1) residuals:
There are no significant autocorrelation lag values in the plots, our model doesn’t seem to need any additional AR terms. You might point out the spike at lag 12: this could be a sign of seasonality. We will look at it in the next part with SARIMA and SARIMAX.
Plotting predictions
Luckily, statsmodels has a nice API which allows us to plot predictions from ARIMA processes. I highly recommend you put your data in a DataFrame with a DateTimeIndex, because the plot_predict() method really likes dates…
In this case, we will choose 12 forecasting steps, and will set the dynamic keyword to True: this will force the algorithm to use its own forecasted values as the lagged values for future forecasts:
Let’s have a look on a larger prediction window (34 forecasting steps):
That’s all folks ! In the next part, I talk about seasonality with Seasonal Holt-Winter and Seasonal ARIMA.