Managing Calendars of Holidays in Time Series Prediction Projects

George Vyshnya
SBC Group Blog
Published in
3 min readAug 8, 2017

Every Data Scientist likes to spend more time on cool fun things like exploring new datasets, engineering innovative new features, inventing and validating cool new algorithms and strategies etc. However, reality is often different (especially if you work in a business environment). There are additional technical overheads that take significant fraction of time daily (raw data pre-processing, integration with various data sources and enterprise systems, re-deploying applications, turning data products into business-ready applications with intuitive UI, solid performance and continual event flow etc.)

So we see a certain collision between desires of Data Scientists and practical implementation tasks on their scorecards. On a bigger scale, this is being addressed by Analytical DevOps convergence (see my earlier article on Analytical DevOps to get more details).

At the same time, there is something you can do on the micro level if you keep automating various routines/tedious tasks in a form of easy-to-utilize and easy-to-reuse components/modules. The more ‘bricks’ of this sort you have in your arsenal, the easier you onboard on every new project or contest.

Today I would like to focus on one of such tiny pieces of automation that could bring a good value in case you are frequently involved in Time Series forecasting projects.

Why Are Calendars of Holidays Important?

In many Time Series forecasting projects where human and social activities are predicted, it really matters to take into account holiday trends. We all know about sound holiday effects on shopping and e-commerce behavior of buyers, web site traffic patterns, visiting places of entertainment etc.

So informing your Time Series prediction algorithm about holiday schedule/calendar for the modelled activity is often wise (unless you deal with some natural science TS forecasting or handle physical signal processing tasks).

How We Automate Management of Calendars of Holidays

Let’s imagine a problem to forecast web traffic to a global multi-lingual site that hits the visits of users across the globe. For the case like that, we need Calendars of Holidays to be

- Different for different user locales/geographical locations

- Consistent from year to year

- Easily reused across multiple projects

There are good news for Python-based data scientists and developers as they can benefit from the infrastructure below

- The US federal holidays are well managed by Pandas and its Time Series analysis capabilities (see pandas.tseries.holiday.USFederalHolidayCalendar class)

- The holidays of all G20 countries as well as many other countries in the world are solidly managed by a slick third-party package Workalendar(https://github.com/novafloss/workalendar)

- Ability to implement any custom reusable Calendar of Holidays (as well as other type of Calendars — like Calendar of Scheduled Maintenances, Trading Calendars etc.) as a class inherited from Panda’s AbstractCalendar super-class (see a good example explaining it in https://stackoverflow.com/questions/33094297/create-trading-holiday-calendar-with-pandas)

So once you install Pandas and Workalendar on your machine, you are ready to develop something that can be reused across all of your time series forecasting projects in future.

Notes:

- You will have to install Workalendar directly from its git repo using the command below:
`pip install git+https://github.com/novafloss/workalendar.git`

- Although Workalendar is pretty robust and universal, it still lacks implementations of holiday calendars for several countries as of this moment (in particular, it refers to China and countries in Eastern Europe — Ukraine, Belarus, and Russia)

What We Get

As a result, we can obtain reusable code fragments/modules to manage multi-national Holiday Calendars across multiple Time Series forecasting projects. Below I demonstrate how it had been implemented it in the course of tackling the project per https://www.kaggle.com/c/web-traffic-time-series-forecasting (and that’s what I am going to use in all future projects where Calendar of Holidays adds value).

Obviously, this is just one of the tiny pieces where reusable automation can add the edge to your day-to-day productivity in Data Science projects. My mission is to encourage you to take more actions to build useful automation — and then definitely share your wisdom/contributions with the community.

--

--

George Vyshnya
SBC Group Blog

Seasoned Data Scientist / Software Developer with blended experience in software development, IT, DevOps, PM and C-level roles. CTO at http://sbc-group.pl