Forecasting with Fourier series

Josh Tankard
9 min readOct 31, 2023

--

The main objective of this post is to uncover how Fourier series can be fitted to create timeseries forecasts for highly seasonal data just as the highly popular forecaster Prophet.

Prophet, developed by engineers at Meta, is an an “analyst-in-the-loop” forecaster for timeseries data. It was developed with Facebook’s business use-case in mind where their data tends to exhibit highly seasonal properties as opposed to auto-regressive data — think how web-traffic varies by hour in day, day in week or week in year as opposed to share prices where there is very little seasonality and future prices can be highly correlated to the price at the previous unit of time.

The “analyst-in-the-loop” part is also highly important. The creators expect that users have strong domain knowledge (as opposed to forecasting expertise) in order to tune the forecaster’s parameters. An interesting study, acknowledged by Meta engineers in a blog post, has shown that other forecasters (such as AutoArima) can be significantly better when compared to Prophet’s default parameters (potentially) debunking the idea that Prophet is the best “out-of-box” tool for timeseries.

With this in mind, it seemed reasonable that we could re-create a (non-identical) version of Prophet and learn about timeseries modelling along the way.

Fourier series

The main idea Prophet employs is Fourier series: that any periodic signal can be approximated pretty well by summing different sine waves. The classic example being that even a square wave with its straight edges and right-angles can still be represented pretty well.

When specifying the types of seasonality (daily, weekly, yearly etc.) in the Prophet forecaster, the user can also determine the Fourier order for each seasonality type, this refers to the number of Fourier series that are summed for each seasonality component.

Example from the Wikipedia page. As the number of terms increases, the fit improves.

Looking at Prophet’s code, we can see that for each Fourier order, a sine wave and a cosine wave are created. For the first term, the frequency of these waves are equal to one over the specified periods for that seasonality (e.g. 1 / 7 for weekly). And as the Fourier order increases, the frequency increases proportionally (e.g. 1/7, 2/7, 3/7).

@staticmethod
def fourier_series(
dates: pd.Series,
period: Union[int, float],
series_order: int,
) -> NDArray[np.float_]:
"""Provides Fourier series components with the specified frequency
and order.

Parameters
----------
dates: pd.Series containing timestamps.
period: Number of days of the period.
series_order: Number of components.

Returns
-------
Matrix with seasonality features.
"""
if not (series_order >= 1):
raise ValueError("series_order must be >= 1")

# convert to days since epoch
t = dates.to_numpy(dtype=np.int64) // NANOSECONDS_TO_SECONDS / (3600 * 24.)

x_T = t * np.pi * 2
fourier_components = np.empty((dates.shape[0], 2 * series_order))
for i in range(series_order):
c = x_T * (i + 1) / period
fourier_components[:, 2 * i] = np.sin(c)
fourier_components[:, (2 * i) + 1] = np.cos(c)
return fourier_components

Fitting sine waves

Prophet uses Stan, a probabilistic programming framework for statistical inference. However, for this implementation I wanted to give gradient descent a try.

For any gradient descent algorithm, a cost function is needed to provide feedback for the updates at each iteration. The mean squared error is one of the most commonly used cost functions. The first order derivative of this is calculated by multiplying by the exponent term (2) and reducing the exponent by 1.

error = y_pred — y
mean_squared_error = (1 / n_samples) * (error) ** 2
mse_derivative = (2 / n_samples) * (error)

This provides the cost gradient at any given point and therefore which way to move for the next iteration and by how much. Here is an example fitting linear regression weights — the bias term can be considered as part of the regressors, X:


X = ...
y = ...
learning_rate = .001
n_iterations = 10_000

n_samples, n_features = X.shape
params = np.zeros(n_features, dtype=np.float64)

for _ in range(n_iterations):
# create new set of preds
y_pred = X @ params
# calculate error term
error = y_pred - y
# calculate cost for each weight
dw = X.T @ ((2 / n_samples) * (error))
# update weights
params -= learning_rate * dw

Unlike linear regression, where for each feature a singular parameter is being calculated, for sine waves there are three parameters to fit (amplitude, phase and frequency). But in any case, the principles remain the same — for every iteration, each parameter is dialled up or down a tiny amount first in isolation (to learn the direction and magnitude of change) and then they are all updated together for the next iteration.

A sine wave can be expressed in terms as:

amplitude * sin(phase + frequency * x)

For each of amplitude, phase and frequency, we need to know the derivative of the sine wave with respect to a change in that parameter. You can either do this by recalling trigonometry or, as in my case, hitting up Wolfram Alpha:

Amplitude -> d/da(a sin(p + f x)) = sin(f x + p)
Phase -> d/dp(a sin(p + f x)) = a cos(f x + p)
Frequency -> d/df(A sin(p + f x)) = A x cos(f x + p)

Prior to fitting, it is necessary to initiate the wave parameters according to the number of seasonalities present and the number of Fourier orders specified. As with Prophet, for each Fourier order a wave and an offset wave are initiated — remember a cosine wave is a sine wave offset by a quarter cycle.

seasonality_terms = {
7.: weekly_seasonality_terms,
30.43: monthly_seasonality_terms,
91.31: quarterly_seasonality_terms,
365.25: yearly_seasonality_terms
}

# no. wave -> total no. fourier terms * 2 (sine + cosine)
n_waves = 2 * sum(seasonality_terms.values())
amplitudes = np.ones(n_waves, dtype=np.float64)
phases = np.zeros(n_waves, dtype=np.float64)
frequencies = np.zeros(n_waves, dtype=np.float64)
count = 0
for periods, terms in seasonality_terms.items():
for j in range(terms):
f = (j + 1) / periods
frequencies[count] = f
frequencies[count + 1] = f
phases[count] = 0 # sine
phases[count + 1] = np.pi / 2 # cosine
count += 2

These parameters can then be fitted with gradient descent:

y = ...
X = ...
t = np.arrange(y.size) * 2 * np.pi
n_samples = y.size

learning_rate = .001
n_iterations = 10_000

for _ in range(n_iterations):

# create predictions
y_pred = np.zeros_like(y, dtype=np.float64)
for w in range(n_waves):
y_pred += amplitudes[w] * np.sin(phases[w] + frequencies[w] * t)

# calculate error term
error = y_pred - y

# calculate cost for each weight
cost_derivative = (2 / n_samples) * (error)

# update amp and phase for each wave
for w in range(n_waves):
da = 1 * np.sin(phases[w] + frequencies[w] * t)
dp = amplitudes[w] * np.cos(phases[w] + frequencies[w] * t)

amplitudes[w] -= da.sum() * learning_rate
phases[w] -= dp.sum() * learning_rate

You may notice that in this example, only the phases and amplitudes are updated whilst the frequencies are inferred directly from the given number of periods. This is a trap I fell into, it turns out that gradient descent will only converge for convex functions and concave functions may not result in stable outcomes. The derivative of a sine wave with respect to frequency is a concave function and as such it is challenging to fit with this process. This is likely why many algorithms, such as Prophet, predetermine the periods for given seasonalities (or the user can specify their own).

The process for fitting Fourier series can also be combined with fitting regressors in the earlier example as well as trend and bias.

Real world example

In this example, total train and bus ridership numbers from the Chigago Transport Authority will be used. The dataset is daily and goes from January 2001 to October 2018. This dataset was chosen because it seems reasonable to expect weekly and yearly seasonality.

The seasonality type looks “multiplicative” (i.e. seasonality etc. multiply the trend rather than add to it). To handle this, the natural logarithm of the data shall be taken prior to training as well as the exponential of the predictions to revert the data back to the original scale.

Given that holidays can affect ridership, the Python holidays is also implemented to find US public holidays.

The full code for FourierForecaster can be found in this GitHub repo. This enables fitting trend & bias as well as any regressors specified (such as holidays). Numba is also implemented to boost performance as well.

Plot components

After fitting, the .plot_components() can be called to visualize the bias, trend, seasonality, regressors (holidays) and noise:

Evaluation

The model is trained on two years of data and forecasts are made for the third year, then in the next iteration the training and forecasting are moved forward one year. This is repeated so that forecasts are made for each year from 2003 to 2018. For example:

Iteration 1: Training 2001 & 2002. Forecasts: 2003.

Iteration 2: Training 2002 & 2003. Forecasts: 2004.

Iteration 3: Training 2003 & 2004. Forecasts: 2005.

Iteration 16: Training 2016 & 2017. Forecasts: 2018.

The scoring metric will be MAPE (not necessarily the best but easy to interpret).

from fourier_forecast.fourier_forecast import FourierForecast
from datetime import date
import pandas as pd
import numpy as np


df = ...
hols = ...

train_years = 2
first_year, last_year = 2003, 2018
ff_results = np.zeros(last_year - first_year + 1, dtype=np.float64)
for i, test_year in enumerate(range(first_year, last_year + 1)):

train_dates = (date(test_year - train_years, 1, 1), date(test_year - 1, 12, 31))
test_dates = (date(test_year, 1, 1), date(test_year, 12, 31))

train_index = df[(df['ds'] >= train_dates[0]) & (df['ds'] <= train_dates[1])].index
test_index = df[(df['ds'] >= test_dates[0]) & (df['ds'] <= test_dates[1])].index

train_df = df[df.index.isin(train_index)]
test_df = df[df.index.isin(test_index)]
train_hols = hols[train_index]
test_hols = hols[test_index]

ff = FourierForecast()
y = np.log(train_df['y'].values)
ff.fit(train_df['ds'].values, y, regressors=train_hols)

preds = ff.predict(ds=test_df['ds'].values, regressors=test_hols)
preds = np.exp(preds)

mape = np.abs(preds / test_df['y'].values - 1).mean()
ff_results[i] = mape

The default Fourier order for weekly and yearly seasonality is 3 and 10 respectively for no reason other than these are also the default values used by Prophet making the comparison more like-for-like.

Comparison to Prophet

Prophet also includes many parameters such as changepoints, flexible trends, multiplicative seasonality, prior scales and so on — many of these can be inferred automatically or tuned by an analyst.

As Prophet can handle multiplicative seasonality internally, no log transformation of the data is necessary. Holidays can be added just as easily. Other than these two, no other parameters will be tuned.

from prophet import Prophet
from datetime import date
import pandas as pd


df = ...

train_years = 2
first_year, last_year = 2003, 2018
prophet_results = np.zeros(last_year - first_year + 1, dtype=np.float64)
for i, test_year in enumerate(range(first_year, last_year + 1)):

train_dates = (date(test_year - train_years, 1, 1), date(test_year - 1, 12, 31))
test_dates = (date(test_year, 1, 1), date(test_year, 12, 31))

train_index = df[(df['ds'] >= train_dates[0]) & (df['ds'] <= train_dates[1])].index
test_index = df[(df['ds'] >= test_dates[0]) & (df['ds'] <= test_dates[1])].index

train_df = df[df.index.isin(train_index)]
test_df = df[df.index.isin(test_index)]

m = Prophet(seasonality_mode='multiplicative')
m.add_country_holidays(country_name='US')
m.fit(train_df)

future_df = m.make_future_dataframe(periods=test_index.size).tail(test_index.size)
pred_df = m.predict(future_df)

mape = np.abs(preds / test_df['y'].values - 1).mean()
prophet_results[i] = mape

Comparison to AutoArima

Given the reference at the start of the article, for completeness we can take a look at StatsForecast implementation of AutoArima. There seems to be only the option to set one seasonal length so this is set to 7 given the strong weekly cycles. The other parameters are untouched given the claims of being better “out-of-box” forecaster.

from statsforecast.models import AutoARIMA
from datetime import date
import pandas as pd
import numpy as np

df = ...
hols = ...

train_years = 2
first_year, last_year = 2003, 2018
aa_results = np.zeros(last_year - first_year + 1, dtype=np.float64)
for i, test_year in enumerate(range(first_year, last_year + 1)):

train_dates = (date(test_year - train_years, 1, 1), date(test_year - 1, 12, 31))
test_dates = (date(test_year, 1, 1), date(test_year, 12, 31))

train_index = df[(df['ds'] >= train_dates[0]) & (df['ds'] <= train_dates[1])].index
test_index = df[(df['ds'] >= test_dates[0]) & (df['ds'] <= test_dates[1])].index

train_df = df[df.index.isin(train_index)]
test_df = df[df.index.isin(test_index)]
train_hols = hols[train_index]
test_hols = hols[test_index]

aa = AutoARIMA(season_length=7)
aa.fit(train_df['y'].values, train_hols.astype(np.float64))
preds = aa.predict((test_dates[1] - test_dates[0]).days + 1, test_hols.astype(np.float64))['mean']

mape = np.abs(preds / test_df['y'].values - 1).mean()
aa_results[i] = mape

Results

No single dataset can really give the entire picture of forecaster performance and capability but for demonstrative purposes, the results are quite interesting:

FourierForecast scores: [0.06757175 0.06847599 0.07174342 0.06946523 0.07235714 0.08831864
0.09115687 0.06550169 0.08012167 0.06707774 0.08542078 0.07360266
0.07506197 0.07625026 0.07252445 0.06323971]
Prophet scores: [0.06361311 0.08916864 0.07080911 0.0656165 0.0788751 0.11362799
0.06367285 0.06227564 0.13398795 0.07868203 0.06452716 0.07208917
0.08931415 0.08308844 0.07541953 0.05568152]
AutoArima: [0.07567813 0.07967367 0.08599838 0.0809436 0.08907974 0.1112463
0.08570553 0.09364843 0.13209986 0.08488383 0.08312174 0.08522938
0.14317758 0.1041176 0.08812333 0.07335536]

FourierForecast average MAPE: 7.4%
Prophet average MAPE: 7.9%
AutoArima average MAPE: 9.4%

Over this particular dataset and evaluation metric, the FourierForecaster performed better than both Prophet and AutoArima. That being said, perhaps Prophet has more scope for improvement if the forecaster parameters are fine tuned further. And quite likely, in scenarios where there is a stronger auto-regressive component, AutoArima may outshine both models in other situations.

Resources:

--

--

Josh Tankard

Based on Barcelona, working remotely. Into data analytics, engineering and experimentation. Personal page: jcatankard.github.io