Complete walkthrough of how to do a moving average forecasting using Python or R
1. Introduction
A moving average (MA) is a widely used statistical technique in time series analysis to smooth out short-term fluctuations and highlight longer-term trends. This tutorial will guide you through performing a moving average forecast using R and Python.
2. What is the moving average?
A moving average is a calculation used to analyze data points by creating a series of averages of different subsets of the full data set. It’s often used in time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles.
3. Types of moving average
- Simple Moving Average (SMA): The unweighted mean of the previous n data points.
- Weighted Moving Average (WMA): Similar to SMA, but each data point in the period is given a different weight, with more recent data points usually getting more weight.
- Exponential Moving Average (EMA): Gives more weight to recent observations, making it more responsive to new information.
4. Steps to perform moving average forecast
4.1 Data preparation
Select a dataset that you want to analyze. We will use a built-in dataset provided by both R and Python for simplicity.
Example Dataset: AirPassengers (monthly totals of international airline passengers from 1949 to 1960)
Load dataset in R:
# Load the dataset
data("AirPassengers")
# Convert to time series
ts_data <- ts(AirPassengers, start = c(1949, 1), frequency = 12)
Load dataset in Python:
import pandas as pd
# Load the AirPassengers dataset
data = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv',
parse_dates = ['Month'],
index_col = 'Month')
# Display the first few rows of the dataset
print(data.head())
# Extract the time series
ts_data = data['Passengers']
4.2 Implementing moving average in R
Simple Moving Average:
library(zoo)
# Calculate 12-month simple moving average
sma <- rollmean(ts_data, k = 12, fill = NA, align = "right")
# Plotting
plot(ts_data, main = "Simple Moving Average (SMA)", col = "blue")
lines(sma, col = "red")
legend("topright", legend = c("Original", "SMA"), col = c("blue", "red"), lty = 1)
Weighted Moving Average:
library(TTR)
# Calculate 12-month weighted moving average
wma <- WMA(ts_data, n = 12)
# Plotting
plot(ts_data, main = "Weighted Moving Average (WMA)", col = "blue")
lines(wma, col = "red")
legend("topright", legend = c("Original", "WMA"), col = c("blue", "red"), lty = 1)
Exponential Moving Average:
library(TTR)
# Calculate 12-month exponential moving average
ema <- EMA(ts_data, n = 12)
# Plotting
plot(ts_data, main = "Exponential Moving Average (EMA)", col = "blue")
lines(ema, col = "red")
legend("topright", legend = c("Original", "EMA"), col = c("blue", "red"), lty = 1)
4.3 Implementing moving average in Python
Simple Moving Average:
import matplotlib.pyplot as plt
# Calculate 12-month simple moving average
sma = ts_data.rolling(window = 12).mean()
# Plotting
plt.figure(figsize = (10, 6))
plt.plot(ts_data, label = 'Original')
plt.plot(sma, label = 'SMA', color = 'red')
plt.title('Simple Moving Average (SMA)')
plt.legend()
plt.show()
Weighted Moving Average:
To implement a weighted moving average in Python, you need to create the weights and apply them manually since the pandas
library doesn't have a built-in WMA function.
import numpy as np
import matplotlib.pyplot as plt
# Define weights
weights = np.arange(1, 13)
# Calculate 12-month weighted moving average
wma = ts_data.rolling(window = 12).apply(lambda prices: np.dot(prices, weights)/weights.sum(), raw = True)
# Plotting
plt.figure(figsize = (10, 6))
plt.plot(ts_data, label = 'Original')
plt.plot(wma, label = 'WMA', color = 'red')
plt.title('Weighted Moving Average (WMA)')
plt.legend()
plt.show()
Exponential Moving Average:
import matplotlib.pyplot as plt
# Calculate 12-month exponential moving average
ema = ts_data.ewm(span = 12, adjust = False).mean()
# Plotting
plt.figure(figsize = (10, 6))
plt.plot(ts_data, label = 'Original')
plt.plot(ema, label = 'EMA', color = 'red')
plt.title('Exponential Moving Average (EMA)')
plt.legend()
plt.show()
5. Comparing R and Python for time-series analysis
R:
- Strong in statistical modeling and time series analysis.
- Comprehensive libraries like
zoo
,TTR
, andforecast
. - Syntax and functions can be more intuitive for statistical operations.
Python:
- Versatile and integrates well with web applications and other data science tools.
- Libraries like
pandas
,statsmodels
, andscikit-learn
provide robust tools for time series analysis. - Extensive community support and resources for machine learning and deep learning.
6. When to use the simple, weighted, or exponential moving average forecast
Choosing the appropriate type of moving average depends on the specific characteristics of your data and the goals of your analysis.
6.1 Simple Moving Average (SMA)
When to use:
- Stable data: Use SMA when your data does not have significant trends or seasonal variations. SMA is best suited for datasets with minimal fluctuations.
- Simplicity: SMA is straightforward to calculate and easy to understand. It is useful for getting a quick, smoothed representation of the data.
- Short-term forecasting: For short-term forecasting where the recent changes are not drastically different from historical changes.
Advantages:
- Easy to calculate and interpret.
- Suitable for datasets with low volatility.
Disadvantages:
- Can lag behind trends due to equal weighting of all data points.
- Not responsive to sudden changes in the data.
Example: Use SMA for daily temperature data over a month to get a general trend.
6.2 Weighted Moving Average (WMA)
When to use:
- Recent data is more relevant: Use WMA when recent observations are more important than older observations. This is useful in scenarios where newer data points should have a greater impact on the forecast.
- Highlighting trends: WMA is useful in highlighting trends where certain periods are deemed more significant.
- Short-term trends: For identifying short-term trends that have more weight on recent data points.
Advantages:
- More responsive to recent changes compared to SMA.
- Allows customization of weights to emphasize certain periods.
Disadvantages:
- Requires determination of appropriate weights, which can be subjective.
- More complex to calculate than SMA.
Example: Use WMA for stock prices where recent prices are more indicative of future performance.
6.3 Exponential Moving Average (EMA)
When to use:
- Rapidly changing data: Use EMA when your data is volatile and you need a forecast that responds more quickly to changes. EMA gives more weight to recent observations, making it sensitive to recent changes.
- Trend identification: EMA is effective in identifying trends and reversals due to its weighting scheme.
- Financial markets: Commonly used in financial markets to detect short-term trends and signals for trading decisions.
Advantages:
- More responsive to recent changes than both SMA and WMA.
- Smoothen the data while giving more importance to recent observations.
- Reduces the lag seen in SMA and WMA.
Disadvantages:
- Can be too sensitive to recent data, leading to overreactions in some cases.
- Requires selection of a smoothing factor, which can be somewhat arbitrary.
Example: Use EMA for high-frequency trading signals in the stock market, where the latest price changes are critical.
6.4 Summary
- Simple Moving Average (SMA): Best for stable data with minimal fluctuations and short-term forecasting where simplicity is needed.
- Weighted Moving Average (WMA): Ideal when recent data should have more influence, useful for highlighting trends with specific emphasis.
- Exponential Moving Average (EMA): Suitable for volatile data requiring responsiveness to recent changes, commonly used in financial markets for trend detection.
Conclusion
Choosing the right moving average depends on the nature of your data and the specific requirements of your analysis. Understanding the strengths and limitations of each method will help you make informed decisions and achieve more accurate forecasts.
This guide has shown you how to implement moving average forecasts in R and Python, providing the necessary code snippets and explanations. Whether a beginner or an expert, you can now apply these techniques to your data and explore further with more complex time series models.
Thank you for reading!
I am learning to write, mistakes are unavoidable, even when I try my best. If you need any help/mistakes, please let me know!