Predicting the Future: Exploring Time-Series Forecasting using Prophet

Published in

D ONE

10 min readJul 3, 2024

This story is a collaboration between Ilias Panagiotaras and Ileanna Sotiropoulou with support from D ONE.

Microsoft Designer prompt: hands using a time crystal to predict the future showing math equations inside the crystal ball

In this article, we delve into the cutting-edge time series forecasting model Prophet, by Meta. We’ll explore its methodology, strengths, and weaknesses, providing insights into its applications and performance. Whether you’re a seasoned data scientist or a curious enthusiast, this analysis will offer valuable insights into the realm of predictive analytics.

Forecasting Tomorrow: The Imperative of Predicting the Future

In today’s data-driven world, the ability to predict future trends and patterns has become paramount for businesses, researchers, and decision-makers alike. Time series forecasting, a field of study within data science and statistics, empowers us to anticipate future values based on historical data patterns. At its core, time series forecasting involves the analysis of sequential data points collected over time to predict future values. But why should you care?

This isn’t just about predicting numbers; it’s about gaining a competitive edge in an increasingly dynamic world.

From optimizing resource allocation to guiding strategic decisions, time series forecasting is the key to unlocking opportunities and mitigating risks in a multitude of industries.

From finance to healthcare, from meteorology to e-commerce, time series forecasting finds applications in diverse domains. In finance, it aids in predicting stock prices, currency exchange rates, and financial market trends, guiding investors and financial institutions in making informed decisions. In healthcare, it facilitates forecasting patient admissions, disease outbreaks, and resource demand, enabling healthcare providers to optimize resource allocation and patient care.

While there are several traditional methods that are fit for these tasks, we will deep dive into the potential of Prophet.

Prophet by Meta

Prophet, developed by Meta’s (formerly Facebook) Core Data Science team in 2017, is an open-source forecasting tool, specifically designed to simplify time series analysis. Prophet employs a comprehensive approach to time series forecasting:

Trend Estimation

Initially, Prophet identifies the overall trend by fitting a piecewise linear or logistic regression model to the observed data, capturing the data’s long-term direction.

Seasonality Modeling

Next, Prophet employs Fourier series to discern and model any seasonal patterns present in the data, such as daily, weekly, or yearly cycles, ensuring a comprehensive representation of periodic fluctuations.

Prophet simplifies forecasting by automating the detection of relevant seasonal patterns in data, factoring in the impact of holidays and special events.

Holiday Effects Integration

Alongside trend and seasonality, Prophet accounts for the impact of holidays or special events. Users provide a list of holidays and their associated dates, allowing Prophet to adjust forecasts to accommodate these irregularities in the data.

Uncertainty Estimation

Prophet goes beyond point estimates, providing uncertainty intervals around forecasts. Leveraging a Bayesian framework, it quantifies the inherent uncertainty in the data, empowering decision-makers to assess prediction reliability.

Prophet’s user-friendly code interface in Python and R and automated trend detection streamline the forecasting process, making it accessible and efficient for users across different expertise levels.

Forecasting uncertainty

The data

Let’s load a public dataset — for the purpose of this article, we’ll use the ‘AirPassengers’ dataset (you can find it on Kaggle here).

This dataset contains monthly totals of US airline passengers from 1949 to 1960.

Let’s start by exploring our dataset:

import pandas as pd
#'AirPassengers' dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
data = pd.read_csv(url)

print(data.head())
print(data.info())

import plotly.express as px

# Rename columns for Prophet
data.columns = ['ds', 'y']

# Convert 'ds' column to datetime format
data['ds'] = pd.to_datetime(data['ds'])

# Plot the time series using Plotly for interactive visualization
fig = px.line(data, x='ds', y='y', title='Monthly Number of Air Passengers (1949-1960)',labels={'ds': 'Date', 'y': 'Number of Passengers'})
fig.show()

Monthly number of air passengers from 1949 to 1960.

One important aspect of our dataset is that it represents a non-stationary time series. Unlike stationary time series, which have relatively stable mean and standard deviation values, non-stationary time series do not have consistent mean and standard deviation.

While stationary data simplifies analysis, non-stationary time series are more common in the real world, posing significant challenges for time series forecasting.

Seasonality

Seasonality refers to the recurring patterns or cycles in time series data that occur at regular intervals, such as daily, monthly, or yearly. Performing seasonality analysis is crucial because it helps us identify and understand these recurring patterns, which can significantly improve the accuracy of forecasting models by accounting for predictable fluctuations.

To analyze seasonality in our dataset, we start by visualizing the data to observe any apparent trends or patterns. A few things stand out at first sight:

There is a clear upward trend in the number of passengers over the years.
There appears to be some seasonality in the data, with peaks occurring around the same time each year.

Next, we use time series seasonal decomposition to break down the data into its trend, seasonal, and residual components. The seasonal component reveals a clear, repeating pattern every year, highlighting the seasonality in the data.

from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose the time series
decomposition = seasonal_decompose(data['y'], model='multiplicative', period=12)

decomposed_data = pd.DataFrame({
    'ds': data['ds'],
    'trend': decomposition.trend,
    'seasonal': decomposition.seasonal
})
# Plot the decomposed components
fig_seasonal = px.line(decomposed_data, x='ds', y='seasonal', title='Seasonal Component', labels={'ds': 'Date', 'seasonal': 'Seasonal Effect'})

fig_trend = px.line(decomposed_data, x='ds', y='trend', title='Trend Component', labels={'ds': 'Date', 'trend': 'Trend'})

fig_trend.show()
fig_seasonal.show()

Seasonal component chart depicts recurring patterns in the data over each season. The y-axis represents the seasonal effect, indicating how specific time periods affect the time series, while the x-axis denotes the time period. Peaks and dips highlight predictable seasonal variations (summertime, Christmas, and Easter holidays)

Forecasting with Linear model

As discussed earlier, one of Prophet’s key strengths is its ability to handle seasonality effectively. For our initial experiment, let’s assess the model’s performance without any seasonality adjustments by deactivating all seasonality parameters.

from prophet import Prophet
from prophet.plot import plot_plotly, plot_components_plotly

# Split the data into training and test sets
train = data.iloc[:-24]
test = data.iloc[-24:]

# Initialize Prophet model without seasonality parameters
model = Prophet(yearly_seasonality=False, weekly_seasonality=False, daily_seasonality=False)

# Fit the model to the data
model.fit(train)

# Create a dataframe for future dates
future = model.make_future_dataframe(periods=24, freq='ME')  # Forecasting 2 years into the future

# Make predictions
forecast = model.predict(future)

# Plot the forecast using Plotly
fig_forecast = plot_plotly(model, forecast)
fig_forecast.update_layout(title='Simple Forecast with Prophet', xaxis_title='Date', yaxis_title='Number of Passengers')
fig_forecast.show()

# Evaluate the performance of the simple forecast using Plotly
fig_components = plot_components_plotly(model, forecast)
fig_components.show()

In the first plot, we use Prophet’s internal functionalities to visualize the fit of the model. Here, our model functions as a linear regressor, without trying to fit the data beyond the point of the best fitting line. We can zoom into the forecast of two years to see in detail how it actually performs.

Prophet model fit. Black bullets indicate actual values, blue line is the prediction by Prophet (it fits the training data and predicts the next 2 years)

Actual vs Predicted Number of Passengers using Linear Model

Next, we will discuss metrics, or how we can officially assess how well the model is doing.

Performance Metrics: Understanding MAE and MAPE

When we create a forecast model, we want to measure how accurate it is. Two common metrics for this are Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE).

Mean Absolute Error (MAE) gives us the average distance between our predictions and the actual values, in terms of the number of passengers. It essentially measures how much, on average, our predictions differ from the real numbers, regardless of whether they are higher or lower.

For example, if the MAE is 10, our predictions are off by 10 passengers each month on average.

Mean Absolute Percentage Error (MAPE) expresses the error as a percentage of the actual values, making it easier to understand in relative terms. This metric shows us the average percentage by which our predictions differ from the actual values.

For instance, if the MAPE is 0.05, it indicates that our predictions are, on average, 5% off from the actual number of passengers each month.

from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error

# Calculate performance metrics for the simple forecast
y_true = test['y'].values
y_pred = forecast.iloc[-24:]['yhat'].values

mae = mean_absolute_error(y_true, y_pred)
mape = mean_absolute_percentage_error(y_true, y_pred)

Forecasting with Prophet

The linear model gave us a basic grasp of the overall trend and seasonal patterns in the data. To improve accuracy, we’ll refine our forecast by taking advantage of Prophet’s seasonality features. This means adding more features and fine-tuning the model parameters. This will help us make more precise predictions that better reflect the data’s behavior.

Thankfully, we can very easily integrate temporal features such as Holidays or seasonality into Prophet, and really make it shine with just a few added parameters.

Let’s add seasonality and US holidays to the model to see how they may influence the forecast.

# Adding seasonality to the model
model = Prophet(yearly_seasonality=True)
model.add_seasonality(name='monthly', period=30.5, fourier_order=5)

# Adding holiday effets (we assume US holidays impact air travel)
model.add_country_holidays(country_name='US')
# Fit the model 
model.fit(train)

# Create a dataframe for future dates
future = model.make_future_dataframe(periods=24, freq='ME')

# Make predictions
forecast = model.predict(future)

Actual vs Predicted Number of Passengers using Prophet model

The Prophet forecast demonstrates better capture of the data’s seasonality and trends.

Incorporating holidays and monthly seasonality enhanced the accuracy of our predictions as expected.

So far, so good

Our initial basic forecast provided a preliminary understanding of future trends for the span of 2 years. By integrating seasonality and holidays, we notably improved forecast accuracy. However the metrics suggest that our model does in fact need a bit more help.

Can we take it one step further? Of course! Regressors to the rescue!

Regressors, the secret ingredient

What are Regressors?

Most of the time, time series data is directly affected by external factors such as promotions, events, economic indicators, etc. These are also represented as time series closely following the one we are trying to actually predict.

Prophet allows us to include such external regressors to improve the forecasting accuracy. Think of it as peeking into the future but from another perspective.

Adding an external regressor

Let’s assume the economic indicator can impact air travel
We’ll create a synthetic regressor, completely independent of our data based on this indicator

import numpy as np

# Creating a synthetic regressor (e.g., economic_indicator)
np.random.seed(42)
data['economic_indicator'] = np.random.normal(loc=0.0, scale=1.0, size=len(data))

# Split the data into training and test sets with the regressor data
train = data.iloc[:-24]
test = data.iloc[-24:]

# Adding the regressor to the Prophet model
model_with_regressor = Prophet()
model_with_regressor.add_regressor('economic_indicator')

# Fit the model with the external regressor
model_with_regressor.fit(train)

# Create future dataframe and add the regressor to future dates
future_with_regressor = model_with_regressor.make_future_dataframe(periods=24, freq='ME')
future_with_regressor['economic_indicator'] = np.random.normal(loc=0.0, scale=1.0, size=len(future_with_regressor))

# Make predictions with the model that includes the regressor
forecast_with_regressor = model_with_regressor.predict(future_with_regressor)

Actual vs Predicted Number of Passengers with Prophet (+regressor)

Including external regressors can significantly improve forecasting accuracy if the external factors are relevant. In practice, it is essential to identify key external factors that might impact your time series data and include them as regressors. This could mean anything from weather data, holidays or any other time series derived or not that may have a causal relation.

Prophet’s flexibility allows us to incorporate external regressors easily.
Properly tuned models with relevant regressors can provide more accurate and actionable forecasts.
Continue exploring and experimenting with different regressors to find the best fit for your data.

Always evaluate the model’s performance to ensure that the added regressor positively impacts forecast accuracy. A fair reminder here that correlation does not necessarily imply causation!

Performance Comparison for all models

Performance summary of the models used. From left to right: Linear model, Prophet model, Prophet with regressor

Closing remarks

While Prophet offers significant advantages, other models like ARIMA, SARIMA, and LSTM neural networks are also notable in the field of time series forecasting. ARIMA is well-known for capturing linear relationships in data but requires manual tuning and domain expertise. SARIMA extends ARIMA with support for seasonality, useful for complex seasonal data but still needs significant parameter tuning. LSTM networks excel at capturing complex, non-linear dependencies in large datasets, though they demand extensive computational resources and deep learning expertise. Prophet, in contrast, balances ease of use with robust performance, making it an appealing choice for many forecasting tasks.

Before committing to a time series forecasting model, consider key factors like data characteristics, desired accuracy, and resource availability. Prophet shines in handling strong seasonal patterns and holiday irregularities automatically, reducing the need for manual intervention. Its open-source nature and documentation make it ideal for organizations seeking scalable and interpretable solutions.

Forecasting may not be a crystal ball, but understanding the past can certainly illuminate the road ahead.

In today’s dynamic forecasting landscape, Prophet’s automated approach and emphasis on interpretability provide actionable insights amidst data complexity. By leveraging tools like Prophet, organizations can make informed decisions, staying agile in an ever-evolving environment.

Beyond the Data: Life, The Universe and Everything

Thanks for following along! Keep experimenting with different features and datasets to master time series forecasting with Prophet. Study your specific case carefully and consider factors that could impact the results.

Time series forecasting, like many other data science tasks, is both an art and a science.
Understanding the domain and the data is key to creating effective models.

Keep exploring, learning, and questioning. The universe of data science is vast and ever-expanding. Happy forecasting!