How we tackled FB Prophet’s Monthly Seasonality Issue

Games24x7 Blogs
11 min readJun 18, 2024

--

Introduction

The Data Science team at Games24x7, India’s most scientific, innovative and user-centric company, has built Machine Learning models to predict business metrics, e.g., liquidity, gameplay metrics and so on, for the platform at different time granularities. While working on one such forecasting problem where the FB Prophet model is used, we encountered a challenge with the inbuilt monthly seasonality of the model. The Prophet model takes the duration of monthly seasonality as a fixed value (usually 30.5). However, as we know, the number of days in a month can vary between 28, 29, 30 or 31. Due to this, the model produces predictions where the monthly seasonality component is misaligned with the actual calendar month. This issue can come up with other Time Series approaches that model the monthly seasonality in a similar way. This blog elaborates on how we identified the problem, overview of issues we encountered and explains the resolution in detail.

Problem identification

Most of the metrics on our platform exhibit varying degrees of monthly seasonality. In this blog, we focus on one of the metrics that exhibits a strong monthly seasonality. This metric value is high at the beginning of the month, gradually tapering off as the month progresses. Initially, when we made a forecasting model for this metric, we anticipated the Prophet model to capture this seasonality easily. Indeed, after the deployment, the model captured the monthly seasonality for the first few months effectively. However, over time, we started observing some deviations gradually, which became prominent from the month of March.

We noticed an unusual day-on-day pattern in the monthly predictions for our metric, particularly for March and subsequent months . During these periods, we observed that the predicted value was unusually low in the first 2–3 days. Then there was a sudden jump with a peak around the 4th or 5th of the month. Then the numbers showed the regular monthly fall.

Predicted vs Actual metric value for March

This led to various problems with our forecasts. To list down a few:-

  1. The predicted monthly pattern of the metric did not meet expectations, which reduced the confidence in the forecast.
  2. We had to manually adjust the monthly pattern to get sensible forecasts.
  3. We had to rerun the model in the middle of the month, and the model assumed that its seasonality structure and other components were correct and it attributed the deviation of the initial 4–5 days as an increase in the trend. Subsequently, it projected higher values for the rest of the month (in the re-run), which led to further increase in the error.
  4. The model underpredicted in the initial days. As a result, if we tried to learn the impact of any new event from the model components (Prophet provides a component level breakdown of the forecasts) for those days, it attributed the under prediction error to that event, giving the event an unnecessary extra positive impact.

Due to all these reasons, the interpretation of the model predictions was going off the mark which led to panic among the stakeholders as well as in the Data Science team.

Root Cause Identification

We initially suspected that the incorrect monthly pattern occurred due to external events like pandemic in previous years, which might have disturbed the yearly seasonality of the model. However, after looking into the regressor component and the yearly-seasonality component, we realized that it wasn’t because of any external event.

We invested significant time and effort to identify the issues. Despite extensive online searches, we found limited solutions (to be honest, we did not even know what we were looking for) and there was limited literature around this. Then after some deep-dive into the model components, we realized that in the monthly seasonality component of the model, the month-start peak was getting gradually shifted month after month. At the beginning, the start of the model’s monthly seasonality coincided with the actual month-start. However, over time the start of the model’s monthly seasonality gradually moved away. Let us further elaborate this point.

Here is a graph of the monthly seasonality component learnt by the model for the month of March across multiple years.

As we can clearly see, in the initial phase of the data, the peak for the monthly component was on the 1st March, but gradually it started shifting. In year_4 it is on the 3rd March, in year_5 it is on 4th March, and in year_6 it is on the 5th March. Ideally, the monthly seasonality should have peaked on 1st March every year. Simultaneously, the month-end bump in the metric is getting pushed to the start of the next month, since monthly window size is fixed.

The reason behind this issue is — February is either a 28-day or a 29-day month, whereas the monthly seasonality window is 30.5 days. The monthly seasonality window from February extends into the start of March. Since it typically follows a downward trend, predictions for the first two days of March are much lower than they should be. The new 30.5-day cycle kicks in around March 3rd, causing the predicted value of the metric to peak on March 3rd or 4th.

The root cause is that the monthly seasonality is modeled as a fixed time period of 30.5 days, whereas the number of days in a month varies (28, 29, 30, or 31). The issue isn’t specific to February and March and is present in other months as well. However, due to alternating periods of 30 and 31 days, the shift due to the 30.5 days period balances out. It becomes prominent in March because February is a shorter month of 28/29 days. Once this shift occurs in March, it gets carried forward in all other months of the year and then takes another jump in next year’s March.

Solution

To rectify the monthly seasonality we tried few quick things -

  1. Tweaking monthly period of monthly seasonality

Since we can configure the monthly period in monthly seasonality of Prophet, we tried different values such as 30, 30.5 and 31.

## seasonal period can be tuned using period parameter in seasonality
m = Prophet(yearly_seasonality=True,
weekly_seasonality=True,
seasonality_mode='multiplicative',
)

m.add_seasonality( name='monthly', period=30, fourier_order=10)

We tested results across different months but obviously this did not give us the desired improvement because it does not solve alignment of seasonality.

2. Increasing Fourier Order of monthly seasonality

We tweaked the Fourier order in the hope that a higher Fourier Order would be able to capture the monthly pattern.

##fourier order can be tuned using fourier_order parameter in seasonality 
m = Prophet(yearly_seasonality=True,
weekly_seasonality=True,
seasonality_mode='multiplicative',
)

m.add_seasonality( name='monthly', period=30.5, fourier_order=30 ,prior_scale=seasonality_prior_scale )

But, even this method didn’t address the issue of misaligned seasonality, and hence we were unable to capture the monthly seasonality even with it.

3. Removing monthly seasonality

We thought of doing away with the monthly seasonality component and letting yearly seasonality learn the day-to-day variation for the entire year. But this also did not work because removing the monthly seasonality led to a lot of dependence on the yearly seasonality component which could not capture a proper outlook for the month.

Since these simple tweaks on monthly-seasonality did not solve our problem, we had to try a few creative solutions that we felt would be more robust.

  1. Reducing the Data Size

We trained the model with data from recent three years with the expectation that there will be lesser scope to push the month-start in monthly seasonality. While it helped us a bit with the monthly-seasonality part, on the flipside, this approach led to a drop in model accuracy because of lesser training data.

2. Prophet’s inbuilt Conditional Seasonality

We used this method to create four different monthly seasonal periods based on the number of days in the month.

## creating 'is_28_day','is_29_day','is_30_day','is_31_day' flag based on the ## number of days in the month

df['is_leap_year'] = df['ds'].dt.is_leap_year.astype("int")
df['is_non_leap_year'] = (~df['ds'].dt.is_leap_year).astype("int")

df['is_28_day'] = np.where((df['month']==2)&(df['is_non_leap_year']==1), 1, 0)
df['is_29_day'] = np.where((df['month']==2)&(df['is_leap_year']==1), 1, 0)
df['is_30_day'] = np.where(df['month'].isin([4,6,9,11]), 1, 0)
df['is_31_day'] = np.where(df['month'].isin([1,3,5,7,8,10,12]), 1, 0)

m = Prophet(yearly_seasonality=True,
weekly_seasonality=True,
seasonality_mode='multiplicative',
)
m.add_seasonality(name='monthly_28_day',period=28, fourier_order=10,condition_name='is_28_day') ## adding a seasonality in model for 28 days period
m.add_seasonality(name='monthly_29_day',period=29, fourier_order=10,condition_name='is_29_day') ## adding a seasonality in model for 29 days period
m.add_seasonality(name='monthly_30_day',period=30, fourier_order=10,condition_name='is_30_day') ## adding a seasonality in model for 30 days period
m.add_seasonality(name='monthly_31_day',period=31, fourier_order=10,condition_name='is_31_day') ## adding a seasonality in model for 31 days period

Since this method targeted the root cause of the problem and it is an in-built provision in Prophet, we expected it to work. But surprisingly, it didn’t give us the desired results.

As we can see from the plot below, the monthly pattern has even worsened.

Tuning the Fourier order for each conditional seasonalities also did not yield better results. Even in training data, we observed that the seasonality components were not being learned appropriately. Following are the model components learned for 30-days seasonality and 31-days seasonality.

3. Dummy Variables for each Day

We created dummy variables for each day indicating the day of month, which will have a value of 1 or 0 based on the condition whether it is equal to a particular day of month.

 df['ds'] = pd.to_datetime(df['ds'])
df['day'] = df['ds'].dt.day
for i in range(1,32):
df['is_' + str(i) + '_day'] = np.where(df['day']==i, 1, 0)

In theory, it is technically correct to model the monthly-seasonality this way apart from using the Fourier series approximation. Please refer to Appendix for more details.

This approach worked well for us in terms of error and the day-wise structure. However, it becomes challenging operationally to deal with 31 variables. Also, due to the addition of all these variables, the interpretability of the monthly component becomes difficult.

On a separate note, this methodology is not scalable for longer seasonalities like quarterly and yearly seasonalities since it requires adding a variable for each day which could lead to the variable space exploding and learning will be hampered.

4. Holiday Flags

We created four different holiday flags, each working whenever a certain condition is met.

i) 31 day flag for 31 days month e.g Jan, March, May

ii) 30 day flag for 30 days month e.g April, June, September

iii) 29 day flag for 29 days month e.g Feb 2020, Feb 2024

iv) 28 day flag for 28 days month e.g Feb 2021, Feb 2022, Feb 2023

In a way, this is similar to the approach that uses Prophet’s inbuilt conditional seasonality but there are differences when it comes to implementation. The inbuilt seasonality utilizes Fourier series approximation, whereas with the holiday flag, the model first establishes a baseline by incorporating components such as trend, yearly and weekly seasonality. Then the monthly pattern of the metric — starting at a high and gradually decreasing as the month progresses — is attributed to the monthly-seasonality’s holiday flag (along with other regressors).

While trying this approach, we switched off Prophet’s monthly seasonality

This method does not rely on a fixed 30.5 period for every month and learns the month-start and month-end in a much better way. While we cannot share the exact results in this blog, we would like to mention that this approach worked the best for us both in terms of errors and consistency of results over time, especially for metrics that have monotonic monthly pattern

start_dt = '2017-01-01'
end_dt = '2023-12-31'

monthly_flag = pd.DataFrame()
current_dt = pd.to_datetime(start_dt)

while current_dt<pd.to_datetime(end_dt):
month = current_dt.month

## based on the no. of days in month creating a flag with upper_window equal to (number_of_days_in_month - 1), as count starts from 0
if month in ([1,3,5,7,8,10,12]):
monthly_flag = monthly_flag.append(pd.DataFrame({'holiday':'monthy_31_days' , 'ds' : current_dt,
'lower_window' : 0, 'upper_window' : 30}, index =[0]))

elif month in ([4,6,9,11]):
monthly_flag = monthly_flag.append(pd.DataFrame({'holiday':'monthy_30_days' , 'ds' : current_dt,
'lower_window' : 0, 'upper_window' : 29}, index =[0]))

elif (current_dt.is_leap_year == True ) & (current_dt.month==2) :
monthly_flag = monthly_flag.append(pd.DataFrame({'holiday':'monthy_29_days' , 'ds' : current_dt,
'lower_window' : 0, 'upper_window' : 28}, index =[0]))

else :
monthly_flag = monthly_flag.append(pd.DataFrame({'holiday':'monthy_28_days' , 'ds' : current_dt,
'lower_window' : 0, 'upper_window' : 27}, index =[0]))

current_dt = current_dt + dateutil.relativedelta.relativedelta(months=1)

m = Prophet(yearly_seasonality=True,
weekly_seasonality=True,
seasonality_mode='multiplicative',
holidays = monthly_flag
)

As we see below, this approach is capturing the monthly seasonality in the desired way.

Conclusion

  • We were able to tackle the monthly seasonality better with the Holiday Flag approach as compared to the standard 30.5 days period Prophet Monthly seasonality.
  • This method isn’t specific to just the Prophet model, but can also be applied to other models that apply the monthly seasonality in a similar way (just a disclaimer, we have not validated this hypothesis for other models).
  • This method can help us in capturing any distinct behavioral patterns of 30/31 days. However, on the flip side it can also lead to overfitting.
  • The overfitting issue can get more prominent in February, as the 28-days flag occurs only once a year and the 29-days flag occurs only once every 4 years.

Acknowledgements

Grateful to Tridib Mukherjee, Chief DS and AI Officer, for constant inspiration and guidance, and Sachin Kumar, Associate Director Data Science, for mentoring us and entrusting us with the responsibility to build Business Forecasting models. This project has been an immense learning experience for all of us.

About the authors

Deepanshu Singh and Abhi Jain work in the Data Science & Artificial Intelligence team at Games24x7.

Deepanshu Singh is working as Senior Data Scientist at Games24x7 and has worked on various challenges related to Business Projections, Acquisition Marketing and Risk. His expertise lies in Time Series, Deep Learning, Optimization and is currently working in Generative AI domain. He has done his masters from ISI Banglore.

LinkedIn Profile — https://www.linkedin.com/in/deepanshu-singh-247393184/

Abhi Jain is working as a Lead Data Scientist and leads Projections, Acquisition Marketing and Hyperpersonalization related Data Science projects. He has done B.E. (Hons) from BITS Pilani, Goa and PGDBA from IIM Calcutta, IIT Kharagpur, ISI Kolkata.

LinkedIn Profile — https://www.linkedin.com/in/abhi-jain-a4541076/

References

  1. https://otexts.com/fpp2/useful-predictors.html
  2. https://facebook.github.io/prophet/docs/quick_start.html#python-api
  3. https://peerj.com/preprints/3190/

Appendix

Taking reference from Hyndman’s book, for modeling seasonality, we can use both Seasonal Dummy Variables(for each day of the season) and Fourier Series approximation.

If m is the seasonal period, then first few Fourier terms are given by -

and so on.

With Fourier terms, we often need fewer variables. Fourier terms are particularly useful for modeling longer seasonal periods, such as monthly, quarterly, and yearly patterns, which would otherwise require 31, 122, and 366 dummy variables, respectively in daily data. However, for shorter periods, like weekly seasonality, we can use either Fourier terms or Dummy variables.

--

--

Games24x7 Blogs

Welcome to the world of Games24x7! We talk about the science behind gaming, engineering, our work culture and lots more. Stay connected, keep gaming!