Website Visitor Forecast with Facebook Prophet: A Complete Tutorial
With installation instructions, data, parameters tuning, and both simple & sophisticated forecasting — Part 1
As a data analyst at Microsoft, I must investigate and understand time-series data every day. Besides looking at some key performance metrics like SRPVs, CPCs, Conversion rate in a historical context, we also need to look forward into the future sometimes — that is, to forecast. Having insights on the current data and into the future help us and our clients adjust in advance.
I have tried many approaches. One of my personal favourites is LSTM. But here in the current company, they employ FB Prophet. While in Rome, do as the Romans do. Therefore, in this project, I will stick to Prophet to forecast the unique visitor amount of a website in Europe. The website domain name, which later you will see, is fabricated. You will only find the date, location, and visitor amount in the data set.
Facebook Prophet has its advantage, though it is not particularly robust. But in terms of forecasting, it is simple, easy to use, fully automated and fast. You don’t have to deal with data dimensions, and there is no reverse standardisation needed. If you are not too familiar with Python and coding, and you do not have too much time for modelling, then it is definitely one of the best options, in my own opinion.
In this article, I will cover,
- Installation of Prophet
- Simple ETL, and Data Visualisation
- Simple forecasting (forecast with default settings)
- Forecasting with model assessment
- Forecasting with tuned parameters
However, the original article was too long. Therefore it is now divided into 2 — the first 3 sessions are in part 1, while session 4 and 5 are in part 2. If you are not interested in installing and simple forecasting, please click here to part 2.
1. Installation
A few years ago, when python was still at its 3.8 version, new Prophet users could just pip install the library and then forecast. But while most Python users have already updated to 3.9x, or even more recent versions, Prophet sadly is not updated as quickly and only supports 3.8x or lower Python version. Therefore most of us need a solution for this incompatibility problem.
Virtual Environment is one of the solutions for this. To find out what is a virtual environment, please click here.
Please feel free to skip if you are only into FB Prophet.
- To create a virtual environment in Windows 10 (this tutorial is written with Windows 10 platform, rather than Mac, which I used to work on), we first have to install Anaconda. Please click here to the official site.
Then finish the installation by following the installation guide. The installation is easy. We only have to click ‘yes, yes…’, then we will get what we need.
2. After the installation, we will use the command prompt. But this time, we are not using the Windows command prompt. Instead, we will use the Conda command prompt.
If you can’t find the Anaconda command prompt, go to the search bar pointed by the red arrow above, and search ‘Anaconda Prompt’.
3. A little black window will pop up. In this Anaconda command prompt, enter:
conda create -n python_3_8 python=3.8
What it is doing is to create a new working environment named python_3_8 with python3.8 installed in it.
4. Then we enter:
conda activate python_3_8
After entering this command, the system will enter into the virtual environment python_3_8. From now on, your python_3_8 virtual environment is replacing your original python environment (temporarily). You can also select this as a working kernel in Visual Studio Code.
To escape from this virtual environment after your project, simply enter:
conda deactivate
5. Now we have the working environment, we need to install libraries. The following are the library and dependencies we will need. Just install the below line by line.
conda install libpython m2w64-toolchain -c msys2conda install numpy cython -c conda-forgeconda install matplotlib scipy pandas -c conda-forgeconda install pystan -c conda-forgeconda install -c anaconda ephemconda install -c anaconda scikit-learnconda install -c conda-forge seabornconda install -c plotly plotlyconda install -c conda-forge optunaconda install -c conda-forge Prophet
After installing all these, we can head to our data now.
2. Simple ETL, and Data Visualisation
We will load our data from the csv, and visualise the data here. As the data was extracted from other databases, I won’t cover the steps here. So we will go directly to the csv file and explore. The data can be downloaded here.
The Jupyter notebook file can also be found in this link — download notebook.
First, we import the data.
df = pd.read_csv(‘data.csv’)df2 = df.copy()
df2.head()
We explore our data.
print(df2[‘Country’].value_counts(), “\n”)print(df2[‘Country’].nunique(), “unique values.”)
After getting a rough idea of our dataset, we clean up the DateTime column and filter the needed data points only. I am only interested in the German data, so I will use loc for filtering.
df2[‘date’] = pd.to_datetime(df2[‘datepart’], dayfirst=True).dt.datedf2 = df2.loc[(df2[‘Country’]==’Germany’)] # we are only interested in the visitors from Germany in this tutorial.df_de = df2.copy()
Now we have a data frame with only the website’s performance in Germany.
Ensuring there is no null value in the dataset.
df_de.isna().count()/df_de.count()
Prophet has a strict requirement for the df to be fed into it. ‘ds’ and ‘y’ columns are the standard columns needed. Others are, for example, ‘cap’ and ‘floor’. We won’t need them here in our tutorial because we will set our model to ‘linear’ rather than ‘logistic’ this time. I will cover the reason for this later in the part 2 of this tutorial.
df_de2 = df_de.groupby(“date”).agg(np.sum)df_de2 = df_de2.reset_index()df_de2.columns = [‘ds’, ‘y’]df_de2 = df_de2[[‘y’, ‘ds’]]df_de2
Then I will visualise the data distribution of our unique visitor column with Plotly. Plotly is my favourite visualisation library because of its easy application and interactivity.
import plotly.io as piopio.renderers.default = “notebook”fig_line = px.line(df_de2, x=”ds”, y=”y”, title=’The number of unique visitors of www.lorentzyeung.com in the previous 3 years’)fig_line.show()
Find out the possible outliners with mean and S.D. .
df_stat = df_de2.describe()mean = df_stat.loc[“mean”][“y”]std = df_stat.loc[“std”][“y”]upper = mean+std*3lower = mean-std*3print(‘ Mean: ‘, mean, ‘\n’, ‘Standard Deviation: ‘, std, ‘\n’, ‘Upper Limit: ‘, upper, ‘\n’, ‘Lower Limit:’, lower)
Then we visualise the possible outliners with box plot by Plotly.
Our data set almost shows a totally symmetrical bell shape (vertically), which means normal distribution is found. No outliners are present in our dataset. This is ideal because normal distribution is the most common type of distribution assumed in technical market analysis and in other types of statistical analyses.
3. Simple forecasting (forecast with default settings)
In this session, I will start with a simple Prophet forecast here. It is simple because we do not need to specify any particular parameter value here. I will only use the default setting. Then, we will visualise our results.
m = Prophet(interval_width=0.95, weekly_seasonality=False, daily_seasonality=False)m.add_country_holidays(country_name=’DE’)m.fit(df_de2)future_df = m.make_future_dataframe(periods=52,freq=’W’)forecast_default = m.predict(future_df)plot_forecast = m.plot(forecast_default)
The prediction and graph make sense to the human brain, or mine at least. Although the trend is plateauing, the 5 % confidence intervals show us that it may trend down in the coming year.
plt_components = m.plot_components(forecast_default)
From the component plot above, we can see the amount of our visitors is trending up, though holidays are affecting the performance. It seems that Christmas is the biggest hurdle for our website. However, the bank holidays in January and May give our website the most robust boost. The graphs here totally make sense as your website are selling products not relating to Christmas a lot.
Now, you have learned how to install, and forecast with the out-of-the-box Prophet model. In the next session which is in part 2, I will show how to evaluate the model, and how to improve/optimize it.
Thank you for reading. If you like this tutorial, please share to your data science friends, and follow me. This following are the motivation for me to continue contributing to the community.