Introduction to Time Series Forecasting: Smoothing Methods

Essential Terms of Time Series, Smoothing Methods to Forecast and Applied Example of Triple Exponential Smoothing with Python, Hyperparameter Optimization

Baysan
CodeX
9 min readFeb 28, 2022

--

Hey, it has been a while since I have written. Actually, I missed this platform however I have excuses why I couldn’t write any publish. I have come so close to finish my university education and also I have did some changes on my career in these weeks I couldn’t write. Thankfully, now I am available to write what I want ^^.

In this story, we will talk about what time series forecasting is, what its essential terms are and how we can apply them in Python. I wish you to enjoy and extract tidbits from this story.

Photo by Kevin Ku on Unsplash

Essential Terms of Time Series (Forecasting)

We can say time series based jobs have an enormous piece in the sector. Data that is created based on ordered time observation are called “time series”. We are going to explain 4 essential terms to understand time series.

  • Stationary
  • Trend
  • Seasonality
  • Cycle

Stationary

  • The Stationary represents that the statistical features of a time series don’t change over time.
  • We can say the time series is stationary if the average, variance and covariance of a time series don’t change and are stable by time.

There is an example picture of stationary time series below.

Image by VBO

Trend

  • The Trend represents the increasing or decreasing of a time series.
  • The Trend is an increasing or decreasing structure of a time series in long term.

We see examples of both an increasing and decreasing time series below.

Example of an increasing time series | Image by VBO
Example of an increasing time series | Image by VBO

Seasonality

  • This term basically ask this question: Is the trend following a pattern or not?
  • We say there is seasonality in the time series if there is a pattern that occurred by periods.

We see an example time series that has seasonality below.

Image by VBO

Cycle

  • The Cycle is a bit similar to the Seasonality. However, it is weaker than the Seasonality to show appeared patterns.
  • It contains patterns that are repeated.
  • It is easy and short term to handle patterns in the time series that are seasonal.
  • In the Cycle time series, patterns are long term and it is hard to handle them in short term.

We see a picture to imagine the difference between the Cycle and Seasonal time series below.

Image by VBO

Level

The Level is the average of a time series.

Understanding to Nature of Time Series

General rule: the time series are more affected by a time series that is coming one step before of its. For an instance, 2 June’s sales are more effective for sales of 3 June than sales of 1 June. We are going to learn 2 key terms to well understand this rule: Moving Average and Weighted Average.

Moving Average

For short, this average tells us that the future value of a time series is the average of its k previous values.

Image by VBO

We see an example of moving average with 4 days below.

Image by VBO

Weighted Average

It is similar to the Moving Average. Its main idea is to give more weight (importance) latest observations.

Image by VBO

We see an example of a weighted average calculation below.

Image by VBO

Now, we know the essential terms to understand time series forecasting. There are 3 types of time series forecasting:

  • Smoothing Methods
  • Statistical Methods
  • Machine Learning

In this story, we will dive into the smoothing methods.

Smoothing Methods

There are 3 different smoothing methods to use in different 3 cases:

  • Single Exponential Smoothing (SES)
  • Double Exponential Smoothing (DES)
  • Triple Exponential Smoothing (TES)

We will explain all of these methods we mentioned above and do an applied example about TES.

Single Exponential Smoothing (SES)

This method works with just stationary time series. If we want to use this method, the time series shouldn’t have trend and seasonality.

This method:

  • Can handle level
  • Forecasts by exponentially correcting
  • Give weights to previous values based on “the future is more related with the past”
  • Forecasts by previous act values and previous forecasted values (giving weights exponentially)

The formula of this method:

Image by VBO
  • y_hat t: the time unit to be forecasted
  • yt-1:the previous act value (learning, the method learns from this value)
  • y_hat t-1: the previous forecasted value (remember, the method remember from this value)
  • a: smoothing factor. It is between 0 and 1

So, the method uses the previous act value and previous forecasted value to forecast the next value.

Example of SES | Image by VBO

Double Exponential Smoothing (DES)

This method works with stationary and trend time series. If we want to use this method, the time series should have not a seasonality.

This method

  • Can handle level and trend
  • Exponentially correcting by considering the trend effect
  • In addition to SES, the trend is also taken into account
  • Should be used on the time series that have no seasonality and have single variable (feature)

The formula:

Image by VBO

The latest equality tells us that the next period = level + trend. In mean, the next period = average of the previous period (level) + trend.

Example of SES | Image by VBO

Triple Exponential Smoothing (TES, a.k.a Holt-Winters)

This method works with the time series that are stationary, seasonality and trend.

This method;

  • Can handle level, trend and seasonality
  • In addition to SES, the trend and seasonality is also taken into account
  • Is the most improved smoothing method
  • Forecasts dynamically consider level, trend and seasonality
  • Should be used with the time series that have trend and/or seasonality and single variable (feature)

The formula of this method:

Image by VBO

Summary Table of Smoothing Methods

We see a summary comparison table below of the smoothing methods we talked about above.

Image by VBO

Applied Example of Triple Exponential Smoothing with Python

In this section, we will be doing an applied example with Python to apply the TES method. I have prepared a Kaggle notebook for this section. We will use a dataset from statsmodels which is named co2 . You can go through the notebook by using the link below.

Setup Packages

Firstly, we need to install and import the needed packages.

import itertools
import warnings
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
from sklearn.metrics import mean_absolute_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing

I don’t need to install the packages because I am working on Kaggle. However, if you are working on your local, maybe you need to install the packages. In this case, you can use the code below.

!pip install matplotlib
!pip install numpy
!pip install statsmodels
!pip install sklearn

Loading & Preparing The Data

In this applied example, we will use the CO2 dataset from the statsmodels.api package as we mentioned above. The dataset is a period of CO2 record between March 1958 and December 2001.

data = sm.datasets.co2.load_pandas()
y = data.data
y.head()
Image by Author

This dataset’s time scope is weekly. We will transform the scope from weekly to monthly.

y = y['co2'].resample('MS').mean()

Then, I am going to check if there is any null or not.

y.isnull().sum()>>> 5

I am going to fill in these 5 null values. To do that, I use bfill method. If an observation is null, automatically filled with the next observation’s value.

y = y.fillna(y.bfill())

And get the first overview of the dataset.

y.plot(figsize=(15, 6))
plt.show()
Image by Author

Creating A Function To Compare Train, Test and Forecasted Values

We can visualize the comparison of the real and forecasted values by using the function below.

def plot_co2(train, test, y_pred, title):
"""
This function to visualize the act and predicted value of y
"""
mae = mean_absolute_error(test, y_pred)
train["1985":].plot(legend=True, label="TRAIN", title=f"{title}, MAE: {round(mae,2)}")
test.plot(legend=True, label="TEST", figsize=(6, 4))
y_pred.plot(legend=True, label="PREDICTION")
plt.show()

We use `train[“1985”:]` to narrow the scope of the visualization.

Creating TES Model

I am going to divide my dataset into train and test parts.

# from 1958 to end of the 1997
train = y[:'1997-12-01']
len(train) # 478 month
# from 1998 to first month of the 2001
test = y['1998-01-01':]
len(test) # 48 ay

Now, I will create (train) a TES instance (model) based on the train dataset I divided above. To create a TES instance, we use ExponentialSmoothing from statsmodels.tsa.holtwinters package.

tes_model = ExponentialSmoothing(train,
trend="add",
seasonal="add",
seasonal_periods=12
).fit(smoothing_level=0.5,
smoothing_trend=0.5,
smoothing_seasonal=0.5
)

Now I can forecast N steps later by using the TES instance I created. We forecasted 48 steps later and stored the forecasted values into y_pred variable.

y_pred = tes_model.forecast(48)

We can use the method we created above which is plot_co2 named to visualize real values and predicted values.

plot_co2(train, test, y_pred, "Triple Exponential Smoothing")
Image by Author

Actually, we’ve completed the “Applied Example of Triple Exponential Smoothing with Python”. However, I don’t want to stop here. Because there is another very important topic: hyperparameter optimization for TES. We will be learning how we can optimize the hyperparameters for TES in the coming section.

Hyperparameter Optimization for Triple Exponential Smoothing

Before starting how to optimize hyperparameters for TES, I want to talk a little bit about what hyperparameter is. Hyperparameters are parameters to use while creating or building models. For an instance, we saw above there are some parameters in the TES’s formula. We can create TES models by playing these parameters. We can try to use different parameters to get the best result of the model. This is the hyperparameter tuning/optimization.

I have already coded a function to optimize TES models. I don’t want to explain line by line the code in this story. If you want to get more detail, you can go through the Kaggle link to get a better understanding of the code by comment lines.

def tes_optimizer(train, test, abg, trend_mode='add', seasonal_mode = 'add', seasonal_period=12,step=48):
"""
This function to optimize hyperparameters for the TES model
"""
best_alpha, best_beta, best_gamma, best_mae = None, None, None, float("inf")
for comb in abg:
tes_model = ExponentialSmoothing(train, trend=trend_mode, seasonal=seasonal_mode, seasonal_periods=seasonal_period).\
fit(smoothing_level=comb[0], smoothing_trend=comb[1], smoothing_seasonal=comb[2])
y_pred = tes_model.forecast(step)
mae = mean_absolute_error(test, y_pred)
if mae < best_mae:
best_alpha, best_beta, best_gamma, best_mae = comb[0], comb[1], comb[2], mae
print([round(comb[0], 2), round(comb[1], 2), round(comb[2], 2), round(mae, 2)])
print("best_alpha:", round(best_alpha, 2), "best_beta:", round(best_beta, 2), "best_gamma:", round(best_gamma, 2),
"best_mae:", round(best_mae, 4))
return best_alpha, best_beta, best_gamma, best_mae

We see the optimizer function above. To use this function, I am going to create hyperparameter pairs. The function will try each pair to get a better result on the model then return the best parameters.

alphas = betas = gammas = np.arange(0.10, 1, 0.20)abg = list(itertools.product(alphas, betas, gammas))

Then I push these parameters to the optimizer function to get the best parameters.

best_alpha, best_beta, best_gamma, best_mae = tes_optimizer(train,test, abg)

Final The Best TES Model

We have the best parameters to create or build a new TES model. Let’s create a new model by using the parameters.

final_tes_model = ExponentialSmoothing(train, trend="add", seasonal="add", seasonal_periods=12).\
fit(smoothing_level=best_alpha, smoothing_trend=best_beta, smoothing_seasonal=best_gamma

And forecast for the 48 steps later with the final model.

y_pred = final_tes_model.forecast(48)

Now, if we use the visualizer function we created above, we can see the success difference between the optimized model and the unoptimized model.

Image by Author

If you go to the previous TES result chart, you will see why hyperparameter optimization is important.

Finally

Hopefully, you enjoyed this. I enjoyed coding and explaining. I preferred to give an applied example of TES instead of DES and SES. Because, as we mentioned before TES is the most improved method among the smoothing methods. I hope you will use this to build your own projects.

Kind regards.

--

--

Baysan
CodeX
Writer for

Lifelong learner & Developer. I use technology that helps me. mebaysan.com