I often see people use one dummy variable to model whether a particular day is a holiday or not. Why this doesn’t work?
Because all holidays are not the same and your model can not distinguish between different holidays if you use only one dummy variable to identify whether it is a holiday or not.
Let’s dive in…
There are two types of holidays:
- The ones that happen on the same day of the week every year (also called fix-day; e.g. Easter Monday).
- The ones that happen on the same date, but on different days of the week every year (also called fix-date holidays; e.g. Christmas).
Why are holidays hard to predict?
Because these are rare events and we do not have enough observations. Usually, load forecasting models need at least two to three years of data to make reliable forecasts. As each day of the week happens at least 52x times a year, a particular holiday happens only once a year. In the case of fix-day holidays at least the day of the week is the same, whereas fix-date holidays can happen on any day, so they are even harder to predict.
How Can We Model Holidays?
If a fix-date holiday happens during the working days…
… model as Saturday or Sunday.
People's behavior during holidays is usually quite similar to behavior on Saturdays or Sundays. We can leverage this insight and try modeling holidays (fix-date), which happen during the week as Saturday or Sunday. The graph below shows this kind of case. It can be seen that the profile looks very similar to Saturday or Sunday.
If a fix-date holiday happens during the weekend…
…leave as is.
In case the fix-date holiday happens on Saturday or Sunday, the behavior stays almost the same as if there were no holidays (see the graph below). In Slovenia, people do not work on holidays. So if the holiday is on Saturday, the people that would otherwise work on Saturday stay at home. So the behavior is still not the same as on Saturday, but it is still very similar.
On Easter Saturday, Sunday or Monday…
…try adding dummies for each day separately (or every hour of each day separately)
On January 1st…
… try adding dummies.
In Slovenia, the demand on January 1st is the lowest. Modeling as Saturday or Sunday will probably not work, as demand is very low. Try modeling it separately (add separate dummy for a whole day or for every hour of the day).
The last but not the last…
…allways use cross-validation to evaluate your models. There is no universal rule (no free lunch) in machine learning. Always try different things.
Holidays in every country are different, the above explanations are based on my observations. Nevertheless, we all have similar problems and I think these approaches can generalize well.
As a famous statistician, George Box said, all models are wrong, but some are useful. Models will always be wrong, especially on holidays. But it is up to you, to improve them and find the best modeling approach!
If you find this useful, please share this blog on LinkedIn and connect with me or join my group AI in Smart Grids on LinkedIn and connect with others working in this field.