Classical time-series forecasting methods in Python and R for beginners

Mochamad Kautzar Ichramsyah
CodeX
Published in
11 min readMay 25, 2024
Photo by Maxim Hopman on Unsplash

A. What is time-series forecasting?

Time series forecasting is a method used to predict future values based on previously observed values in a time-ordered sequence. This technique is widely used in various fields, such as finance, economics, weather prediction, inventory management, and many more.

A1. Definition and components

Time series data is a sequence of data points collected or recorded at specific time intervals (e.g., daily stock prices, monthly sales figures, yearly temperature readings).

Four components for time-series forecasting are

  1. Trend: The long-term movement or direction in the data (upward, downward, or stable).
  2. Seasonality: Regular, repeating patterns or cycles in the data (e.g., increased sales during holidays).
  3. Cyclical Patterns: Long-term fluctuations due to economic or business cycles, which are not of fixed frequency.
  4. Irregular/Noise: Random variations or anomalies that do not follow any pattern.

A2. Classical time-series forecasting methods

  1. Naive Forecast: Uses the last observed value as the forecast for all future values.
  2. Moving Average: Averages a specified number of the most recent data points.
  3. Simple Exponential Smoothing (SES): Applies exponentially decreasing weights to past observations.
  4. Holt’s Linear Trend Model: Extends SES to capture linear trends.
  5. Holt-Winters Seasonal Model: Extends Holt’s method to capture seasonality.
  6. ARIMA (AutoRegressive Integrated Moving Average): Combines autoregression, differencing, and moving average.
  7. SARIMA (Seasonal ARIMA): Extends ARIMA to handle seasonal data.

A3. Evaluation metrics

  1. Mean Absolute Error (MAE): The average absolute difference between predicted and actual values.
  2. Mean Squared Error (MSE): The average of squared differences between predicted and actual values.
  3. Root Mean Squared Error (RMSE): The square root of the MSE, giving a measure in the same units as the original data.
  4. Mean Absolute Percentage Error (MAPE): The average of absolute percentage errors, useful for comparing forecast accuracy across different scales.

A4. Applications

  1. Finance: Forecasting stock prices, interest rates, or market trends.
  2. Economics: Predicting GDP growth, unemployment rates, or inflation.
  3. Supply Chain Management: Estimating demand, inventory levels, and restocking schedules.
  4. Weather and Climate: Predicting temperatures, precipitation, and other meteorological variables.

B. Naive Forecast

  • Who: Data analysts and forecasters
  • What: Naive method forecasts the next value as the last observed value
  • When: Useful when you have little or no information about the data trend
  • Where: Can be applied to any time series dataset
  • Why: Simple to implement and serves as a benchmark for other methods
  • How: By taking the last observed value and using it for future forecasts

B1. Naive Forecast: Python

import pandas as pd
import matplotlib.pyplot as plt

# Load the data
data = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv',
index_col = 'Month',
parse_dates = True)
data.index.freq = 'MS'

# Naive forecast
naive_forecast = data['Passengers'].iloc[-1]

# Create a forecast series
forecast = pd.Series([naive_forecast] * 12,
index = pd.date_range(start=data.index[-1],
periods = 12,
freq = 'MS'))

# Plot the data and forecast
data.plot(label = 'Observed')
forecast.plot(label = 'Naive Forecast',
color = 'red')
plt.legend()
plt.title("Naive Forecast")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.show()

B2. Naive Forecast: R

library(forecast)
library(ggplot2)

# Load the data
data <- AirPassengers

# naive
fit <- naive(data, h = 12)
autoplot(fit) + ggtitle("Naive Forecast") + xlab("Year") + ylab("Passengers")

B3. Naive Forecast: Fact Check

In my 10 years of experience in data analytics, I never used this method even once, due to its limitations, that’s why in my opinion, it only useful to learn what is forecasting is.

C. Moving Average

  • Who: Data analysts and forecasters
  • What: Moving average smooths out short-term fluctuations to highlight longer-term trends
  • When: Best used when you want to remove noise from a time series
  • Where: Applicable to any time series data with noise
  • Why: Simplifies the data to reveal underlying trends
  • How: By averaging a specified number of the most recent data points

C1. Moving Average: Python

# Moving average
data['Moving_Average'] = data['Passengers'].rolling(window = 12).mean()

# Plot the data and moving average
data[['Passengers', 'Moving_Average']].plot()
plt.title("Moving Average Forecast")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.show()

C2. Moving Average: R

library(forecast)
library(ggplot2)

# Load the data
data <- AirPassengers

# moving average
fit <- ma(data, order = 12)
data_combined <- cbind(data, fit)
autoplot(data_combined) + ggtitle("Moving Average Forecast") + xlab("Year") + ylab("Passengers")

C3. Moving Average: Fact Check

This is the most basic forecasting method I used in my real job, the main job is to smoothen the trend inside our time-series data. As you can see above, we got the moving average line with index 12, which means calculating the average using the last 12 data points.

D. Simple Exponential Smoothing (SES)

  • Who: Data analysts and forecasters
  • What: SES uses a weighted average of past observations, with weights decaying exponentially
  • When: Useful for data without trend or seasonal patterns
  • Where: Can be applied to any stationary time series data
  • Why: Simple yet effective for short-term forecasting
  • How: By applying exponentially decreasing weights to past observations

D1. Simple Exponential Smoothing: Python

from statsmodels.tsa.holtwinters import SimpleExpSmoothing

# Simple Exponential Smoothing
model = SimpleExpSmoothing(data['Passengers'])
fit = model.fit()
forecast = fit.forecast(12)

# Plot the data and forecast
data['Passengers'].plot(label = 'Observed')
forecast.plot(label = 'SES Forecast', color = 'red')
plt.legend()
plt.title("Simple Exponential Smoothing Forecast")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.show()

D2. Simple Exponential Smoothing: R

library(forecast)
library(ggplot2)

# Load the data
data <- AirPassengers

# Simple Exponential Smoothing
fit <- ses(data, h = 12)
autoplot(fit) + ggtitle("Simple Exponential Smoothing Forecast") + xlab("Year") + ylab("Passengers")

D3. Simple Exponential Smoothing: Fact Check

It is only useful for data without trend or seasonal patterns, as mentioned in “When”. We are using the same dataset for each method explained here, so if you can find any stationary time-series data, please try by yourself to implement this method.

E. Holt’s Linear Trend Model

  • Who: Data analysts and forecasters
  • What: Holt’s method captures linear trend patterns in data
  • When: Ideal for data with a linear trend but no seasonality
  • Where: Suitable for any time series data with a clear trend
  • Why: Extends SES by incorporating trend information
  • How: By modeling both the level and the trend components

E1. Holt’s Linear Trend Model: Python

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Holt's Linear Trend Model
model = ExponentialSmoothing(data['Passengers'], trend = 'add')
fit = model.fit()
forecast = fit.forecast(12)

# Plot the data and forecast
data['Passengers'].plot(label = 'Observed')
forecast.plot(label = 'Holt Forecast', color = 'red')
plt.legend()
plt.title("Holt's Linear Trend Forecast")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.show()

E2. Holt’s Linear Trend Model: R

library(forecast)
library(ggplot2)

# Load the data
data <- AirPassengers

# Holt's Linear Trend Model
fit <- holt(data, h=12)
autoplot(fit) + ggtitle("Holt's Linear Trend Forecast") + xlab("Year") + ylab("Passengers")

E3. Holt’s Linear Trend Model: Fact Check

This model is an extension of the previous method, SES, but added trend information inside the dataset. That’s why the forecast result is quite different from the previous one, there is a slight increase because the dataset contains a positive trend.

F. Holt-Winters Seasonal Model

  • Who: Data analysts and forecasters
  • What: Holt-Winters method captures seasonality, trend, and level in the data
  • When: Best for data with both trend and seasonal components
  • Where: Suitable for time series data with periodic fluctuations
  • Why: Provides accurate forecasts for seasonal data
  • How: By modeling the level, trend, and seasonal components separately

F1. Holt-Winters Seasonal Model: Python

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Holt-Winters Seasonal Model
model = ExponentialSmoothing(data['Passengers'], seasonal = 'mul', seasonal_periods = 12)
fit = model.fit()
forecast = fit.forecast(12)

# Plot the data and forecast
data['Passengers'].plot(label = 'Observed')
forecast.plot(label = 'Holt-Winters Forecast', color = 'red')
plt.legend()
plt.title("Holt-Winters Seasonal Model Forecast")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.show()

F2. Holt-Winters Seasonal Model: R

library(forecast)
library(ggplot2)

# Load the data
data <- AirPassengers

# Holt-Winters Seasonal Model
fit <- hw(data, seasonal = "multiplicative", h = 12)
autoplot(fit) + ggtitle("Holt-Winters Seasonal Model Forecast") + xlab("Year") + ylab("Passengers")

F3. Holt-Winters Seasonal Model: Fact Check

This model is an extension of the previous method by adding seasonality information. As we can see, the forecast result follows the trend and seasonality of the dataset.

G. Auto-Regressive Integrated Moving Average (ARIMA)

  • Who: Data analysts and forecasters
  • What: ARIMA models auto-regression, differencing, and moving averages to make forecasts
  • When: Useful for non-seasonal data with trends and patterns
  • Where: Applicable to a wide range of time series datasets
  • Why: Provides a flexible framework for modeling various types of time series data
  • How: By combining autoregressive, differencing, and moving average components

G1. ARIMA: Python

from statsmodels.tsa.arima.model import ARIMA

# Fit ARIMA model
model = ARIMA(data['Passengers'], order = (5, 1, 0))
fit = model.fit()

# Forecast
forecast = fit.forecast(steps = 12)

# Plot the data and forecast
data['Passengers'].plot(label = 'Observed')
forecast.plot(label = 'ARIMA Forecast', color = 'red')
plt.legend()
plt.title("ARIMA Forecast")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.show()

G2. ARIMA: R

library(forecast)
library(ggplot2)

# Load the data
data <- AirPassengers

# ARIMA model
fit <- auto.arima(data, seasonal = FALSE)
forecasted <- forecast(fit, h = 12)
autoplot(forecasted) + ggtitle("ARIMA Forecast") + xlab("Year") + ylab("Passengers")

G3. ARIMA: Fact Check

This method is the common method to forecast your data. It is useful for non-seasonal data with trends and patterns. In R, you need to specify the parameter seasonal = FALSE to use ARIMA, because the default value is TRUE, which will be explained after.

H. Seasonal Auto-Regressive Integrated Moving Average (SARIMA)

  • Who: Data analysts and forecasters
  • What: SARIMA extends ARIMA to include seasonal components
  • When: Ideal for seasonal data with trends and patterns
  • Where: Suitable for any time series data with seasonality
  • Why: Captures both seasonal and non-seasonal patterns for accurate forecasting
  • How: By incorporating seasonal differencing and seasonal autoregressive and moving average terms

H1. SARIMA: Python

from statsmodels.tsa.statespace.sarimax import SARIMAX

# Fit SARIMA model
model = SARIMAX(data['Passengers'], order = (1, 1, 1), seasonal_order=(1, 1, 1, 12))
fit = model.fit()

# Forecast
forecast = fit.get_forecast(steps = 12).predicted_mean

# Plot the data and forecast
data['Passengers'].plot(label = 'Observed')
forecast.plot(label = 'SARIMA Forecast', color = 'red')
plt.legend()
plt.title("SARIMA Forecast")
plt.xlabel("Year")
plt.ylabel("Passengers")
plt.show()

H2. SARIMA: R

library(forecast)
library(ggplot2)

# Load the data
data <- AirPassengers

# Fit SARIMA model
fit <- auto.arima(data)
forecasted <- forecast(fit, h = 12)
autoplot(forecasted) + ggtitle("SARIMA Forecast") + xlab("Year") + ylab("Passengers")

H3. SARIMA: Fact Check

This method is an extension of the ARIMA method by adding seasonal information, that is why it’s called Seasonal ARIMA (SARIMA). The result is quite similar to the Holt-Winters Seasonal Model, but the calculation behind the forecast is quite different.

I. Example datasets to learn

Classical time series forecasting methods provide a solid foundation for beginners in data analytics. By exploring these methods in both R and Python, we can develop a deeper understanding of time series analysis and forecasting. Experiment with different datasets and parameters to enhance your skills and proficiency in time series forecasting.

I1. Airline Passengers Dataset

  • Description: Monthly totals of international airline passengers (in thousands) from 1949 to 1960.
  • Use Case: Useful for learning various time series forecasting methods, including Naive, Moving Average, Exponential Smoothing, Holt-Winters, ARIMA, and SARIMA.
  • Where to Find: Available in the R datasets package as AirPassengers and in various online repositories.

I2. Monthly Milk Production Dataset

  • Description: Monthly milk production (in pounds per cow) from 1962 to 1975.
  • Use Case: Suitable for exploring seasonal patterns and trends using Holt-Winters and SARIMA models.
  • Where to Find: Available in the R datasets package as milk and in various online repositories.

I3. CO2 Concentration Dataset

  • Description: Monthly atmospheric CO2 concentrations at the Mauna Loa Observatory from 1958 to 2020.
  • Use Case: Ideal for studying long-term trends and seasonal patterns using methods like Holt-Winters and SARIMA.
  • Where to Find: Available in the R datasets package as co2 and in various online repositories.

I4. Daily Minimum Temperatures Dataset

  • Description: Daily minimum temperatures in Melbourne, Australia, from 1981 to 1990.
  • Use Case: Suitable for exploring daily patterns and trends using ARIMA and SARIMA models.
  • Where to Find: Available in various online repositories.

I5. Sunspot Numbers Dataset

  • Description: Monthly mean relative sunspot numbers from 1749 to 1983.
  • Use Case: Useful for long-term trend analysis and forecasting using ARIMA and Holt-Winters models.
  • Where to Find: Available in the R datasets package as sunspot.month and in various online repositories.

I6. Electricity Consumption Dataset

  • Description: Hourly electricity consumption data for a household.
  • Use Case: Useful for short-term load forecasting using Moving Average and ARIMA models.
  • Where to Find: Available in various online repositories.

I7. Retail Sales Dataset

  • Description: Monthly retail sales data for various sectors.
  • Use Case: Suitable for exploring seasonal patterns and trends using Holt-Winters and SARIMA models.
  • Where to Find: Available in various online repositories.

I8. International Tourism Dataset

  • Description: Monthly number of international visitors to Australia from 1980 to 1995.
  • Use Case: Ideal for studying seasonal effects and trends using Holt-Winters and SARIMA models.
  • Where to Find: Available in the R fpp2 package as austourists.

I9. Financial Time Series Dataset

  • Description: Daily closing prices of a stock or index.
  • Use Case: Useful for financial forecasting using ARIMA and Moving Average models.
  • Where to Find: Available in various online repositories, including Yahoo Finance.

I10. Sales of Shampoo Over Three Years Dataset

  • Description: Monthly sales of shampoo from January 2001 to December 2003.
  • Use Case: Suitable for learning trend analysis and forecasting using Simple Exponential Smoothing and Holt-Winters models.
  • Where to Find: Available in various online repositories.

J. Conclusion

Classical time series forecasting methods provide a solid foundation for beginners in data analytics. By exploring these methods in both R and Python, you can develop a deeper understanding of time series analysis and forecasting. Experiment with different datasets and parameters to enhance your skills and proficiency in time series forecasting.

Using the datasets provided above gives us a wide range of examples for learning classical time series forecasting methods. By applying different techniques to these datasets, you can comprehensively understand time series analysis and improve your forecasting skills.

Thank you for reading this guide! I hope you found it helpful and informative. Your feedback is incredibly valuable to me. Please feel free to share your thoughts, questions, or any constructive criticism in the comments below. Your input helps improve future content and ensures it meets your needs and interests. Happy forecasting!

--

--

Mochamad Kautzar Ichramsyah
CodeX
Writer for

Data analytics professional with 10 years of experience at tech companies in Indonesia.