Holidays Auto-Modelling for Efficient Time-Series Forecasting

Volodymyr Holomb
Analytics Vidhya
Published in
3 min readJun 20, 2022
Photo by Pineapple Supply Co. on Unsplash

Dealing with forecasts in retail often means the need to adjust future sales for some special events such as promo campaigns or holidays. Those excessive sales we observe right before or after official, unofficial or religious holidays have natural consumption patterns underneath: customers may want to get prepaid for the celebration or may need to replenish their supplies afterwards. Sometimes they also enjoy significant discounts, or may simply have enough free time (due to official day-offs) for shopping during the holidays period.

This way or another, here at RBC Group we add a calendar of local holidays to our time-series forecasting models as a rule of thumb. Some algorithms we occasionally use for predicting sales such as Prophet already have quite a convenient out-of-box option to model holidays and recurring events. Moreover, you can easily extend the holiday duration out to a few days within a certain window around the date. For instance, you may reasonably want to include Christmas Eve in addition to Christmas, or want to use Black Friday in addition to Thanksgiving.

The real problem is to decide which window size to use in each case for we have clients with different types of business. Thus, the setting of the holiday window may be done either considering solely the expert knowledge of the business owner (which is not a good practice in general) or analytically (the way we prefer to do it). In the latter case, the basic idea is to gather holiday-related sales statistics within the historic period and calculate the average window for each event to include all above-ordinary sales around the particular date. Further, we will demonstrate such an approach on a set from the famous M5 Forecasting — Accuracy competition on Kaggle.

After downloading the data we will restrict our wrangling to a single state — ‘CA’ (just for sake of simplicity):

As we are not going to work on a ‘store-item’ level, we need to combine all three datasets and assign our target value — ‘daily sales’ — as a product of the ‘qnt’ and ‘sell_price’ columns.

Further, we will aggregate the sales for all the stores in the state ‘CA’ to get the company’s daily sales. Now we can set a rule for labelling holiday-related excessive sales — for this demo, we will consider them as those which are above a 7-days moving average.

For an excessive sale to be considered ‘holiday-related’ it needs to be observed right before or after the holiday date, i.e. to be directly adjoining to the event. On the plot you can easily observe such sales above the black curve (7-days MVA) in the vicinity of the red vertical lines (holidays and special events):

Below is an analytical representation of the plot above (1 is an above-ordinary sales, 0 — ordinary sales). With such a table one can seamlessly calculate the average (yearly) number of adjacent ‘ones’ to each of the holidays.

Upon calculation completion, we can FULLY AUTOMATICALLY get our holiday calendar in the Prophet standard pattern.

Now it’s time to evaluate the performance of the Prophet model with/without using holidays as a predictor (we will perform our test on 143 observations of 2016, in good manners exclude them in all the prior calculations):

Look, how notably the model has uplifted its predictions for the days right before the Super Bowl (2016–02–07), Easter (2016–03–27) or Cinco de Mayo (2016–05–05). All those adjustments were made via a few lines of code considering solely the historic patterns in sales. Each event has got an individual window for further modelling regarding business speciality while we were using a common calendar with holiday dates as a major input.

--

--

Volodymyr Holomb
Analytics Vidhya

As an ML Engineer at RBC Group, I transform raw data with passion and creativity to unlock valuable insights and empower businesses to make informed decisions