Time Series Forecasting using fbProphet with Worked Examples
Swathi Sharma, Saketh Ram Gangam, Nik Brown
Introduction to Time Series Analysis
- A time-series data is a series of data points or observations recorded at different or regular time intervals. In general, a time series is a sequence of data points taken at equally spaced time intervals. The frequency of recorded data points may be hourly, daily, weekly, monthly, quarterly or annually
- Time-Series Forecasting is the process of using a statistical model to predict future values of a time-series based on past results
- Applications of time series are used in statistics, finance or business applications. A very common example of time series data is the daily closing value of the stock index like NASDAQ or Dow Jones. Other common applications of time series are sales and demand forecasting, weather forecasting, econometrics, signal processing, pattern recognition and earthquake prediction.
Components of Time Series
Trend — The trend shows a general direction of the time series data over a long period of time. A trend can be increasing(upward), decreasing(downward), or horizontal(stationary)
Seasonality — The seasonality component exhibits a trend that repeats with respect to timing, direction, and magnitude. Some examples include an increase in water consumption in summer due to hot weather conditions
Cyclical Component — These are the trends with no set repetition over a particular period of time. A cycle refers to the period of ups and downs, booms and slums of a time series, mostly observed in business cycles. These cycles do not exhibit a seasonal variation but generally occur over a time period of 3 to 12 years depending on the nature of the time series
Irregular Variation — These are the fluctuations in the time series data which become evident when trend and cyclical variations are removed. These variations are unpredictable, erratic, and may or may not be random
ETS Decomposition — ETS Decomposition is used to separate different components of a time series. The term ETS stands for Error, Trend and Seasonality
Prophet
The official Prophet homepage states that-
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
Prophet is open source software released by Facebook’s Core Data Science team. It is available for download on CRAN and PyPI.
Installation of prophet
pip install prophet
Worked Examples
Example 1 — Data directly from yfinance API
Importing libraries
# fbProphet
!pip install pystan
!pip install prophet
from prophet import Prophet
from prophet.plot import plot_plotly
# Yahoo Finance
!pip install yfinance
import yfinance as yf
# Others
import pandas as pd
from plotly import graph_objs as go
from datetime import datetime
Importing Data
We will be importing the data using the yfinance API for Google stocks
ySymbol="GOOG"
#for SymbolName in stocksymbols:
data = yf.download(
# tickers list or string as well
tickers = ySymbol,
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period = "2y",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval = "1d",
# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by = 'ticker',
# adjust all OHLC(An open-high-low-close chart is a type of chart typically used to illustrate movements in the price of a financial instrument over time) automatically
# (optional, default is False)
auto_adjust = True,
# download pre/post regular market hours data
# (optional, default is False)
prepost = True
)
data
This is how the data looks :
Resetting the index
data.reset_index(inplace=True)
EDA and Pre-processing
import plotly.offline as py
py.iplot([go.Scatter(
x=data['Date'],
y=data['Close']
)])
Plotting the raw data
Fixing the dataset according to Prophet
df_train = data[['Date', 'Close']]
df_train = df_train.rename(columns={"Date": "ds", "Close": "y"})
df_train.head()
Modeling
m = Prophet(daily_seasonality=True)
m.fit(df_train)
Making Future Predictions
The next step is to prepare our model to make future predictions. This is achieved using the Prophet.make_future_dataframe method and passing the number of days we’d like to predict in the future. We use the periods attribute to specify this. This also include the historical dates. We’ll use these historical dates to compare the predictions with the actual values in the ds column.
- periods: int no. of periods to forecast forward
future = m.make_future_dataframe(periods=2*365)
Obtaining the Forecasts
forecast = m.predict(future)
Plotting the Forecasts
Prophet has an inbuilt feature that enables us to plot the forecasts we just generated. This is achieved using mode.plot() and passing in our forecasts as the argument. The blue line in the graph represents the predicted values while the black dots represents the data in our dataset
m.plot(forecast)
py.iplot([
go.Scatter(x=df_train['ds'], y=df_train['y'], name='Actual'),
go.Scatter(x=forecast['ds'], y=forecast['yhat'], name='Predicted')
])
Plotting the Forecast Components
The plot_components method plots the trend, yearly and weekly seasonality of the time series data.
#### Visualize Each Components[Trends,yearly]
m.plot_components(forecast)
Cross Validation
Next let’s measure the forecast error using the historical data. We’ll do this by comparing the predicted values with the actual values. In order to perform this operation we select cut of points in the history of the data and fit the model with data up until that cut off point. Afterwards we compare the actual values to the predicted values. The cross_validation method allows us to do this in Prophet. This method take the following parameters as explained below:
- Horizon the forecast horizon.
- Initial the size of the initial training period.
- Period the spacing between cutoff dates.
The output of the cross_validation method is a dataframe containing y, the true values and yhat, the predicted values. We’ll use this dataframe to compute the prediction errors
from prophet.diagnostics import cross_validation
df_cv = cross_validation(m, initial='365 days', period='180 days', horizon = '300 days')
df_cv.head()
Obtaining the Performance Metrics
We use the performance_metrics utility to compute the Mean Squared Error(MSE), Root Mean Squared Error(RMSE),Mean Absolute Error(MAE), Mean Absolute Percentage Error(MAPE) and the coverage of the the yhat_lower and yhat_upper estimates
from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p
Visualizing Performance Metrics
The performance Metrics can be visualized using the plot_cross_validation_metric utility. Let’s visualize the RMSE below
from prophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='rmse')
Example — 2 — A Kaggle Dataset
For example 2, we are going to repeat the same steps as described for example, with a kaggle dataset
Importing the Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
data=pd.read_csv('/content/drive/MyDrive/yahoo_stock.csv')
EDA and Pre-processing
Plotting the Raw data
Fixing the data according to prophet
Plotting the Forecasts
References
[1] : Generate Quick and Accurate Time Series Forecasts using Facebook’s Prophet (with Python & R codes) https://www.analyticsvidhya.com/blog/2018/05/generate-accurate-forecasts-facebook-prophet-python-r/
[2] : Tutorial: Time Series Forecasting with Prophet https://www.kaggle.com/code/prashant111/tutorial-time-series-forecasting-with-prophet#4.-Python-API-
[3] : Time Series Forecasting https://www.kaggle.com/code/georgesaavedra/time-series-forecasting
[4] : Stock Prediction with FBProphet+yfinance https://www.kaggle.com/code/kens3i/stock-prediction-with-fbprophet-yfinance
[5] : Complete Guide on Time Series Analysis in Python https://www.kaggle.com/code/prashant111/complete-guide-on-time-series-analysis-in-python/notebook
[6] : OpenAI. DALL-E : https://labs.openai.com/
[7] : OpenAI. (2021). ChatGPT: A Large-Scale AI Language Model. [Computer software]. Retrieved from https://openai.com