Forecasting Using Facebook’s Prophet Library

Christopher Lewis
Analytics Vidhya
6 min readApr 2, 2021

--

What is Prophet?

In 2017, Facebook open-sourced Prophet — a forecasting library equipped with easy-to-use tools available in Python and R languages. While it is considered an alternative to ARIMA models, Prophet really shines when applied to time-series data that have strong seasonal effects and several seasons of historical data to work from.

Prophet is, by default, an additive regression model. It is also specifically designed to forecast business data. According to Taylor and Letham, there are four main components in the Prophet model:

  1. A piecewise linear or logistic growth curve trend. Prophet automatically detects changes in trends by selecting changepoints from the data.
  2. A yearly seasonal component modeled using the Fourier series.
  3. A weekly seasonal component using dummy variables.
  4. A user-provided list of important holidays.
Prophet’s Forecasting Process

The process of using Prophet’s forecasting works like this: A data analyst will obtain a time series dataset and set it up to be compatible with the Prophet library. Then, the analyst will begin modeling and manually input parameters they think would work best to produce the optimal forecasting results. If there is an issue or problem that requires human intervention, Prophet will flag these issues and let the analyst know so they can reinspect the forecast and retune the model based on the feedback.

Prophet Pros

Facebook Prophet is worth understanding and getting under your belt — not only is it easy to implement, but it is also optimized to work with time-series datasets with any of the following characteristics:

  1. Outliers and/or missing values
  2. Strong multiple “human-scale” seasonalities
  3. Known important holidays
  4. Historical trend changes
  5. Hourly, daily, or weekly observations with at least a few months (preferably a year) of history

Working with Prophet on Google Colab

In this section, we will be using Google Colab as our environment. if you have never used Google Colab before but would like to follow along, check out my quick-and-easy Google Colab Walkthrough I made to get your Colab experience set up!

Mounting Our Drive

First, let’s run the below cell into a new notebook in Google Colab to mount our Google Drive:

from google.colab import drivedrive.mount(‘/content/drive’, force_remount=True)

Google Colab has fbprophet already installed, so we don’t need to worry about pip installing anything. Instead, we can immediately import the Prophet model along with other essential libraries into the next cell:

from fbprophet import Prophet
import pandas as pd

Now that we are set up, we can load in a time series dataset! If you would like to use the same dataset as this blog, please download the time-series dataset here.

Reading the CSV file

First, we will set a variable to reference the location of the new dataframe.

df = pd.read_csv(‘../datasets/air_passenger.csv’)

Next, we inspect the dataframe by viewing the first 5 rows.

df.head()
Viewing the first 5 rows

Setting Column Names

One quirk about Prophet is that the columns of the dataset have to be specifically called ‘ds’ and ‘y’. For our dataframe to be compatible with Prophet, we must change our column names by calling:

df.columns = [‘ds’, ‘y’]

Another requirement for Prophet is that we need to make sure our ‘ds’ column contains DateTime objects. We can make sure it contains DateTime objects by calling:

df[‘ds’] = pd.to_datetime(df[‘ds’])

Creating Train and Test Sets

Now that we have successfully changed our column names and converted our ‘ds’ column to contain DateTime objects, we can now create our train and test sets! For this blog, we will have the Prophet model predict 12 months off of the train set and then compare those predictions to our test set. So let’s define our train and test set:

# defining the number of observations we want to predict
nobs = 12
train = df[:-nobs]
test = df[-nobs:]

If we view the length of the train and test sets, we should see that the test set contains 12 observations and the train set contains everything but those 12 observations.

print(f"Length of dataframe: {len(df)}\n"
f"Length of train set: {len(train)}\n"
f"Length of test set: {len(test)}")
Viewing the length of each set

Defining Our Prophet Model

Once we have confirmed that the length of the train and test set are correct, we can create an instance of the Prophet model, and then fit the model to the train set.

# Creating an instance of the Prophet model
prophet = Prophet()
# fitting Prophet model to the train set
prophet.fit(train)

For our next step, we must create a place to store the Prophet model’s predictions. Fortunately, fbprophet provides an extremely useful method to create a new dataframe to store our model’s predictions. We will create a variable named ‘future’ which will hold a reference to a new dataframe that contains the dates of all the values in the train set plus the number of observations we want to predict off of the train set:

future = prophet.make_future_dataframe(periods=nobs, freq=’MS’)

To include the model’s predictions into this new dataframe we will create another variable called ‘forecast’ that will hold our model’s predictions.

forecast = prophet.predict(future)

To easily view the forecast information of our model, we can simply say:

fig1 = prophet.plot(forecast)

Additionally, we can also view the changepoints in the plot where the trajectory may have changed by importing and using the function add_changepoints_to_plot:

from fbprophet.plot import add_changepoint_to_plotfig1 = prophet.plot(forecast)# viewing the points in time where the trajectory of the price index changed
a = add_changepoints_to_plot(fig1.gca(), prophet, forecast)
Light blue is the 95% Confidence Interval

Comparing Predictions to Actual Values

Now that we’ve obtained the predictions in our forecast dataframe, we can plot the predicted values to the true values of the test set. All we need to do to view the predicted and true values is to create an ‘ax’ object that plots our forecast dataframe, and then link the axis to our test dataframe. We can also set a limit of the x-axis to view only the predicted and actual values within our targeted 12-month range.

Predictions compared to True values

Great - But how do we know if our model actually performed well? We can import a root-mean-square error function from the statsmodels library to compare the RMSE of our predictions to the true values:

from statsmodels.tools.eval_measures import rmse

To make sure that we feed in the correct variable to the prediction parameter, we will create a new variable called ‘y_pred’. Remember that the forecast variable contains more than the final 12 rows that are our predictions, so we need to separate our prediction values and specify the ‘yhat’ column to make sure our ‘y_pred’ variable is a reference to an array containing the 12-month predictions. We can also define ‘y_true’ to make it look nicer when using the RMSE function.

# Remember nobs = 12y_pred = forecast.iloc[-nobs:][‘yhat’]
y_true = test['y']
rmse(y_pred, y_true)

Remember that with RMSE, it is hard to tell how good a model is performing — it is better to use it when comparing to see if one model performs better than another. For the sake of showing an example, let’s compare the RMSE between an additive Prophet model and a multiplicative Prophet model to see which type of seasonality model performs better on our dataset:

Comparing additive to multiplicative seasonality

It seems that using an additive seasonality when training on our time series dataset gives our model slightly better results than a multiplicative seasonality. If you would like to explore the notebook I used with this blog, you can find it here. Thanks for reading!

Citations

  1. Taylor, Sean, and Ben Letham. “Prophet: Forecasting at Scale.” Facebook Research, 23 Feb. 2017, research.fb.com/blog/2017/02/prophet-forecasting-at-scale.

--

--

Christopher Lewis
Analytics Vidhya

I am an aspiring Data Scientist and Data Analyst skilled in Python, SQL, Tableau, Computer Vision, Deep Learning, and Data Analytics.