Hacking Time-Series Forecasting Like a Pro with FBProphet

Nikolaus Herjuno Sapto Dwi Atmojo
Tokopedia Data
Published in
7 min readJul 24, 2019

Time-series Forecasting is widely known for its difficulty due to its inherent uncertainty. It seems very difficult to tell whether a series is categorized as stochastic or deterministic chaotic or even combination from both states. As Stephen Fry once said,

“How can we understand the present or glimpse of our future if we cannot understand our past?”

That applies to time-series forecast; if we cannot determine which method we can use, we cannot foresee the future.

Time-Series Forecasting

Before going to the time-series forecast, we should know what kind of data a time-series data is. Time-series data is a kind of data that has data points indexed or sequenced in time-based order. Examples of time-series data are the height of the ocean tides, the number of sunspots or even daily closing time of stock trade.

At Tokopedia, sometimes we use time-series forecasting to predict the number of customer service complaint tickets. Doing this allows us to establish monthly, quarterly, or annual capacity planning for our Customer Service Agent to answer the demand, or in this case, the number of complaint tickets.

Time-series forecast itself is one of the methods to create a model for predicting future values based on current and historical time series data. There are four components of time-series:

  1. Trend: an increase or decrease of data; could be linear or nonlinear (logistics growth)
  2. Seasonality: a characteristic of time-series data when it experiences regular and predictable movement after a fixed period of time
  3. Cyclic: a cyclic pattern exists when the data experience rises or falls (regular or periodic fluctuation in data)
  4. Irregularity: the residual of time series after the trend-cycle and seasonal component are removed

What is FBProphet and When Will It Shine?

Do you realize that someone with less knowledge of time-series forecasting skill can do time series forecasting? That’s why Facebook open-sourced its package called FBProphet. What is FBProphet? It’s a tool intended to help you to do time series forecasting at a scale with ease

FBProphet uses decomposable time series model with 3 main components: seasonal, trends, holidays or events effect and error which are combined into this equation:

f(x) = g(x) + s(x) + h(x) + e(t)

FBProphet uses time as a regressor and tries to fit several linear and nonlinear function of time as components. By default, FBProphet will fit the data using a linear model but it can be changed to the nonlinear model (logistics growth) from its arguments.

Before deciding to use FBprophet, there are some data characteristics that should be met:

  1. Hourly, daily or weekly observation within at least a few months (minimum 1 month), but 1 year of historical data is much preferred
  2. Having strong seasonalities: day of the week and time of the year
  3. Important events or holidays that occur must be noted
  4. A reasonable number of missing or outlier data
  5. Historical trend changes

If you meet those conditions, let’s get your hands dirty and experience the FBProphet magic!

Preparing Data Sources

In this tutorial, we use Airbnb Seattle Open Data taken from kaggle as our IDE. From this data set, we picked Airbnb Seattle room booking per listing_id per day; then, sum and average the price daily.

Exploration

The column of calendar.csv on Airbnb Seattle Open Data consists of the following information:

Column Information of calendar.csv
Airbnb Calendar CSV to aggregated per date index

As seen above, the table consists some missing values. Although FBprophet will handle this kind of problem, we will aggregate the table by date per average and total price, and then set the date as the index.

Total Price per date
Average price per date

Both graphs show that at the beginning of the year, the price was relatively low and then it got higher at the end of Q1 to the end of Q2. We can assume that people have started preparing for their summer holiday and went for a vacation. You can double-check this event by using the date from the calendar. In this case, we used dates from United State Holiday calendar as a reference

Average price per month

As shown in the graph, in July 2016, the average price for room reached $151 per night. We can conclude that the price started getting expensive in the mid-year.

Saturating Forecast

By default, FBProphet fits your model into a linear model. When forecasting grows, there are some points that will be on their maximum achievable points. We call this as carrying capacity and we should saturate the forecast growth.

Besides the linear model, FBProphet can opt to use logistics growth trend model instead by changing its argument:

  1. Define the cap or maximum achievable point (optional) : In this case, we try to use a cap and have it as a column on our data frame. We define the value on 5.05 because the peak point on the data is on 5.05, so we will use this value as the cap.
  2. Define whether you want to use the linear model or logistics model (optional) : For this case, we won’t use the logistics growth number and let the model use its default setting for the linear model.

Trend Changepoints

What is Trend Changepoints? Changepoints is the location, or in this case, the date or time index that defines a point when the data starts changing direction, either it is increasing or decreasing. To define trend changepoints at FBProphet, use one of these two methods:

  1. Specify trend changepoints flexibility
  2. Specify the location of changepoints. It means you should define it as a data frame series when the trend starts to change

If you work on the data that has less noise you can add manually the changepoints, but if you work on the data that has a lots of noise, it will take your time to add the changepoints

Tips ! You can use mean and standard deviation to filter the noise or even using IQR (Inter Quartile Range)

changepoint date function

In this case , the changepoints trend is specified by defining changepoint flexibility

Prophet object model

We define the changepoint_prior_scale as the flexibility of trend changepoint to 0.095. Default value for changepoint_prior_scale is 0.05. Increasing it will make the trend more flexible.

FB Prophet Model

Generally, FBProphet Model Method consists of certain arguments such as:

model = Prophet(data,
interval_width = 0.95,
yearly_seasonality = True,
weekly_seasonality = True,
daily_seasonality = True,
holidays = None,
changepoint_prior_scale = 0.05
)
  • Interval width will be the default at 95% confidence interval; this sets the uncertainty interval to produce a confidence interval around the predicted value
  • Yearly_seasonality / Weekly_seasonality / Daily_seasonality at default will be false, so you should change that to the seasonality that you will use
  • Holidays is the argument if you want to include holidays or event date
  • Changepoint_prior_scale is the flexibility of how the model will behave against trend changepoints

The final code for the model and its arguments will be as follow:

On FBProphet, you can make yearly, monthly, weekly, daily or even hourly predictions in one go. For now, we only create a prediction for 60 data points on a daily level. The picture below is the result of the prediction

FB Prophet Forecasting result

Forecast Components

Forecast components can be produced in several view levels, such as how the trend will be on holidays, or on a weekly or yearly basis.

Forecast Components of FB Prophet

Conclusion

FBProphet is interesting, sophisticated and quite easy to implement. Even by using the default argument or parameter, this model allows you to generate good forecast output with little effort or domain knowledge in time-series analysis. While Prophet library is very powerful, there is an additional library that you need to incorporate to evaluate the model. We can do this by using sci-kit learns metric, but sometimes we need something more powerful to evaluate and validate the model, such as cross-validation over sliding windows from the data set.

You can access the jupyter notebook from this Kaggle link in case you want to give it a try. There are many things that can be added or try such as splitting training and testing data, tune your seasonality and add your data trend changepoints, and handle your outlier using FBProphet. Not every time-series problem can be solved using FBProphet, and you can compare it using ARIMA, XGBoost, etc. Happy exploring!

--

--