## Data Science & Machine Learning

# The Math of Prophet

## Breaking down the Equation behind Facebook’s open-source Time Series Forecasting procedure

In this Story we examine the ins and outs of the mathematics utilized by Prophet, a time series forecasting tool by Facebook.

# Outline

# 1) Quick review: What is Prophet?

Prophet is a procedure for

forecasting time series databased on anadditive modelwhere non-linear trends are fit with yearly, weekly, and dailyseasonality,plus holiday effects.

It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is

robust to missing data and shifts in the trend, and typically handles outliers well.

# 2) How does it work? Prophet Equation

The procedure makes use of a decomposable time series model with three main model components: **trend**, **seasonality**, and **holidays**.

Similar to a generalized additive model (GAM), with time as a regressor, Prophet fits several linear and non-linear functions of time as components. In its simplest form;

y(t)= g(t)+ s(t)+ h(t)+e(t)

where:

## g*(t)*

**trend**models*non-periodic*changes (i.e. growth over time)

## s(t)

**seasonality**presents*periodic*changes (i.e. weekly, monthly, yearly)

## h(t)

- ties in effects of
**holidays**(on potentially irregular schedules ≥ 1 day(s))

## e(t)

- covers idiosyncratic changes not accommodated by the model

In other words, the procedure’s equation can be written;

Modeling seasonality as an additive component is the same approach taken by exponential smoothing… GAM formulation has the advantage that it decomposes easily and accommodates new components as necessary, for instance when a new source of seasonality is identified.

Prophet is essentially “framing the forecasting problem as a curve-fitting exercise” rather than looking explicitly at the time based dependence of each observation.

# 3) Trend

The procedure provides two possible trend models for *g(t)**, “*a saturating growth model, and a piecewise linear model.”

## 3.1) Saturating Growth Model

If the data suggests promise of saturation — i.e. one is wrestling constraints like: cubed footage, processing power, number of people w/ Internet access— setting `growth='logistic'`

is the move.

Typical modeling of these ** nonlinear, saturating trends** is basically accomplished;

where:

is the carrying capacity*C*is the growth rate*k*is an offset parameter*m*

There are two primary aspects of growth at Facebook (** fluctuating carrying capacity **and

**) that are not captured in this simplified equation, though.**

*volatile rate of change*

Carrying Capacity v. Time

First, as with many scalable business models ** carrying capacity is not constant** — as “the number of people in the world who have access to the Internet increases, so does the growth ceiling.”

Accounting for this is done by replacing the fixed capacity *C* with a time-varying capacity *C(t)*.

Rate of Change v. Time

Second, the market does not allow for stagnant technology. Advances like those seen over the past decade in handheld devices, app development, and global connectivity, virtually ensure that ** growth rate is not constant**.

Because this rate can quickly compound due to new products, the model must be able to incorporate a varying rate in order to fit historical data.

We incorporate trend changes in the growth model by explicitly defining

changepointswhere the growth rate is allowed to change.

Suppose there are ** S** changepoints at times

*s**j, j*= 1,…,

**.**

*S*Prophet defines a vector of rate adjustments;

where:

**δ***j*is the change in rate that occurs at time*s**j*

The rate at any time ** t** is then the base rate

**, plus adjustments up to that time;**

*k*- This is represented more cleanly by defining a vector;

- such that;

The rate at time

is then k+ta(t)ᵀδ. When the rateis adjusted, the offset parameterkmust also be adjusted to connect the endpoints of the segments. The correct adjustment at changepointmis easily computed as;j

At last, the piecewise `growth=‘logistic’`

model is reached;

An important set of parameters in our model is

C(t), or the expected capacities of the system at any point in time. Analysts often have insight into market sizes and can set these accordingly. There may also be external data sources that can provide carrying capacities,such as population forecasts from the World Bank.

In application, the logistic growth model presented here is a special case of generalized logistic growth curves — which is only a single type of sigmoid curve — allowing the relatively straightforward extension(s) of this trend model to other families of curves.

## 3.2) Linear Trend with Changepoints

The second — much simpler and default — trend model is a simple *Piecewise Linear Model* with a constant rate of growth.

It is best suited for problems without a market cap or other max in sight, and is set via `growth='linear'`

.

For forecasting problems that do not exhibit saturating growth, a piece-wise constant rate of growth provides a parsimonious and often useful model.

Modeling the linear trend is easily realized with Prophet. In fact, not adjusting anything usually does the trick;

where:

is the growth rate*k*has the rate adjustments*δ**m*

and, to make the function continuous, *γ**j* is set to:

## 3.3) Automatic Changepoint Selection

If known, the changepoints *s**j* can be specified by the user as dates of product launches and other growth-altering events, or, by default, changepoints may be automatically selected given a set of candidates.

Automatic selection can be done quite naturally with the formulation in either model by putting a sparse prior on

.δ

Often, it is advisable to specify a large number of changepoints (e.g. one per month for a several year history) and use the prior:

where:

directly controls the flexibility of the model in altering its rate*τ*

Critical note: a sparse prior on the adjustments ** δ** has no impact on the primary growth rate

**, so as**

*k***progresses to 0 the fit reduces to standard (not-piecewise) logistic or linear growth.**

*τ*## 3.4) Trend Forecast Uncertainty

When the model is extrapolated past the history to make a forecast, the trend ** g(t)** will have a constant rate; the uncertainty in the forecast trend is estimated by extending the generative model forward.

The generative model for the trend is that there are;

changepoints*S*- over a history of
points*T* - each of which has a rate change
*δ**j*∼Laplace(0,τ)

Simulation of future rate changes (that emulate those of the past) is achieved by replacing ** τ **with a variance inferred from data.

In a fully Bayesian framework this could be done with a hierarchical prior on

to obtain its posterior, otherwise we can use the maximum likelihood estimate of the rate scale parameter:τ

Future changepoints are randomly sampled in such a way that the average frequency of changepoints matches that in the history:

Thus, uncertainty in the forecast trend is measured by assuming the future will see the **same average frequency**

**and**that were seen in the history. Once

*magnitude*of rate changes**has been inferred from the data, this generative model is deployed to “simulate possible future trends and use the simulated trends to compute uncertainty intervals.”**

*λ*Prophet’s assumption that the trend will continue to change with the same frequency and magnitude as it has in the history is fairly strong, so don’t bank on the uncertainty intervals having exact coverage.

As ** τ **is increased the model has more flexibility in fitting the history and so training error will drop. Even so, when projected forward this flexibility is prone to produce wide intervals.

**The uncertainty intervals are, however, a**

*useful indication of the level of uncertainty*, and*especially an indicator of over fitting*.# 4) Seasonality

The seasonal component ** s(t)** provides a adaptability to the model by allowing periodic changes based on sub-daily, daily, weekly and yearly seasonality.

Business time series often have

multi-period seasonalityas a result of the human behaviors they represent. For instance, a 5-day work week can produce effects on a time series that repeat each week, while vacation schedules and school breaks can produce effects that repeat each year. To fit and forecast these effects we must specify seasonality models that are periodic functions of [time]t.

Prophet relies on Fourier series to provide a malleable model of periodic effects. ** P** is the regular period the time series will have (e.g. P = 365.25 for yearly data or P = 7 for weekly data, when time is scaled in days).

Approximate arbitrary smooth seasonal effects is therefore tied in with a standard Fourier series;

Fitting seasonality requires estimating the 2N parameters β=[a1,b1,…,aN,bN]ᵀ. This is done by constructing a matrix of seasonality vectors for each value of t in our historical and future data, for example with yearly seasonality and N= 10:

Meaning the seasonal component is;

In the generative model, Prophet takes

to impose a smoothing prior on the seasonality.*β∼Normal(0,σ²)*

Truncating the series at *N* applies a low-pass filter to the seasonality, so, albeit with increased risk of overfitting, increasing *N* allows for fitting seasonal patterns that change more quickly.

For yearly and weekly seasonality we have found N = 10 and N = 3 respectively to work well for most problems. The choice of these parameters could be automated using a model selection procedure such as AIC.

# 5) Holidays and Events

Impact of a particular holiday on the time series is often similar year after year, making it an important incorporation into the forecast. The component ** h(t)** speaks for predictable events of the year including those on irregular schedules (e.g. Black Friday or the Superbowl).

To utilize this feature, the user needs to provide a custom list of events. Fusing this list of holidays into the model is made straightforward by assuming that the effects of holidays are independent.

For each holiday ** i**, let

*D**i*be the set of past and future dates for that holiday. Then add an indicator function representing whether time

**is during holiday**

*t***, and assign each holiday a parameter**

*i*

*κ**i*which is the corresponding change in the forecast.

This is done in a similar way as seasonality by generating a matrix of regressors;

and taking,

As with seasonality, Prophet uses a prior `κ∼Normal(0,ν²)`

.

It is often important to include effects for a window of days around a particular holiday, such as the weekend of Thanksgiving. To account for that we include additional parameters for the days surrounding the holiday, essentially treating each of the days in the window around the holiday as a holiday itself.

# Conclusion

Ultimately, Prophet was engineered to help analysts with a variety of backgrounds produce more forecasts with less time invested towards doing so. This was achieved by sticking to a relatively plain model.

After all, “*Introduction to Time Series and Forecasting (Springer Texts in Statistics) 3rd ed. 2016 Edition”* is 425 pages in length, the “*Forecasting at Scale” *Prophet paper is 25 pages, and you’ve read this Story in about 10 minutes.

We use a simple, modular regression model that

often works well with default parameters, and that allows analysts to select the components that are relevant to their forecasting problem andeasily make adjustments as needed.

Thanks for reading; if you’re eager to know more about why Facebook built Prophet, check out this presentation by one of the team leads, Sean Taylor:

# Continued Reading

# References

- Taylor SJ, Letham B. 2017. Forecasting at scale. PeerJ Preprints 5:e3190v2 https://doi.org/10.7287/peerj.preprints.3190v2
- Forecasting at Scale: How and Why We Developed Prophet for Forecasting at Facebook (Lander Analytics ; YouTube)
- Robson, Winston A. “The Prophet on Walmart — Comprehensive Intro to FbProphet.”
*Medium*, Future Vision, 9 July 2019, https://medium.com/future-vision/intro-to-prophet-9d5b1cbd674e