Snowflake
Published in

Snowflake

Generating all the Holidays in SQL — with a Python UDTF

“I want a table with all the Holidays” is a typical request in data analytics — but figuring this out is not as easy as it sounds. For which country, for which sub-region, for which timeframe — are only the initial questions. Then come some technical challenges, that we’ll figure out in this post with Snowflake and Python — including how to update the Anaconda pre-installed packages in Snowpark.

The Python library “holidays” can help you keep track of all the Holidays, as explained by Eryk Lewinson in this TDS post. But then data analysts usually need this data in tables, to join them with the rest of their data in a SQL world. In this post we are going to make this Python package usable within Snowflake, to generate tables with all the Holidays — adjusted to your requirements.

For example, this is a list of Holidays in the US for 2023, generated with a SQL query in Snowflake:

US Holidays in 2024, source ‘python-holidays’

However there are complications, for example the same list for 2022 has a similar set of Holidays, plus some “observed” dates — and you will need to choose if you want to use these observed dates within your data analysis or not:

US Holidays in 2022, source ‘python-holidays’. Note the “Observed” dates for Christmas and Juneteenth.

And to further complicate the situation, each US region has its own list of Holidays. For example, Holidays in California include “César Chávez Day” and “Susan B. Anthony Day”:

US (California) Holidays in 2022, source ‘python-holidays’. Note some California specific dates.

Or sometimes the same Holiday can have a different name in adifferent regions. For example on Martin Luther King Jr. day, Alabama also commemorates Robert E. Lee’s birthday:

US (Alabama) Holidays in 2022, source ‘python-holidays’.

You can see in these few examples that getting a usable list of Holidays for analytics will also need your definition of what’s a Holiday (including moving them to “observed” days) and in which regions they apply.

The good news is that each analyst can use this UDF to generate Holidays according to their needs. We will need some tricks to make it work, and then we’ll be able to generate a table of Holidays with a simple call to this UDF.

To start with, this was the SQL used to generate the tables above:

select *
from table(holidays('US', [2024]));
select *
from table(holidays('US', [2022]));
select *
from table(holidays('US-CA', [2022]));
select *
from table(holidays('US-AL', [2022]));

Installing python-holidays in Snowflake

First attempt: Outdated library

It should be straightforward to get the python-holidays package installed with Anaconda in Snowflake, but it’s not.

create or replace function debug_holidays()
returns string
language python
runtime_version = '3.8'
packages = ('holidays')
handler='x'
as $$
import holidays
def x():
return holidays.__version__
$$;
select debug_holidays();
-- 0.11.3.1

With the above code you can see that the provided version of python-holidays is 0.11.3.1 — however the package is officially on its 0.14.2 release.

The latest version includes the most updated list of Holidays — so we don’t want to mess with an older one. We could wait until Anaconda brings the latest to Snowflake — but let’s use this opportunity to figure out how to load alternative versions of existing packages, or missing ones.

Second attempt: Install your own package

If Anaconda doesn’t provide you the desired package in Snowflake, you can bring your own. In this case we’re going to clone the latest python-holidays from GitHub, compress it into .zip file, and then we can bring it into a Snowflake stage with snowsql:

$ git clone https://github.com/dr-prodigy/python-holidays.git$ cd python-holidays$ zip -r holidays holidays$ snowsql> put 'file://holidays.zip' @~/ auto_compress=false overwrite=true+--------------+--------------+-------------+-------------+
| source | target | source_size | target_size |
|--------------+--------------+-------------+-------------+
| holidays.zip | holidays.zip | 174097 | 174112 |
+--------------+--------------+-------------+-------------+

Now we should be ready to run python-holidays, but more surprises await:

create or replace function debug_holidays()
returns string
language python
runtime_version = '3.8'
imports=('@~/holidays.zip')
handler='x'
as $$
import holidays
def x():
return holidays.__version__
$$;
select debug_holidays();# ModuleNotFoundError: No module named 'dateutil' in function DEBUG_HOLIDAYS with handler x

Third attempt: Python dependencies

The previous message says, that dateutil is missing, and that’s easy to fix with the modules provided by Anaconda:

create or replace function debug_holidays()
returns string
language python
runtime_version = '3.8'
packages = ('python-dateutil')
imports=('@~/holidays.zip')
handler='x'
as $$
import holidays
def x():
return holidays.__version__
$$;
select debug_holidays();-- ModuleNotFoundError: No module named 'hijri_converter' in function DEBUG_HOLIDAYS with handler

That was easy, but now hijri_converter is missing. We can keep asking for Anaconda provided modules until we get this to work:

create or replace function debug_holidays()
returns string
language python
runtime_version = '3.8'
packages = ('python-dateutil', 'hijri-converter', 'convertdate', 'korean_lunar_calendar')
imports=('@~/holidays.zip')
handler='x'
as $$
import holidays
def x():
return holidays.__version__
$$;
select debug_holidays();# 0.14.2

That’s it: We now have python-holidays 0.14.2 working on our Snowflake account.

Bonus attempt: Import precedences

The above solutions solves our dependencies problem, but I wanted to share this finding. What happens if we mix both our own provided packages with Anaconda’s?

create or replace function debug_holidays()
returns string
language python
runtime_version = '3.8'
packages = ('holidays')
imports=('@~/holidays.zip')

handler='x'
as $$
import holidays
def x():
return holidays.__version__
$$;
select debug_holidays();
-- 0.14.2

Notice there are 2 versions of python-holidays above — the one provided by Anaconda, and the newer .zip file provided by us. Asking for both work seamlessly, giving us:

  • Importing the Anaconda provided python-holidays brings in all its dependencies, without us needing to specify them manually.
  • Importing our .zip version of python-holidays takes precedence over the specific one brought for Anaconda, so we get the latest version running.

Should we use these hacks?

Since Python UDFs in Snowflake are in Public Preview, I expect python-holidays to work without the need of knowing these workarounds in the future. However, these tricks will still be useful whenever you run into a similar problem or want to provide your own patches to existing packages.

Designing a Python table UDF to generate a table

This table UDF can generate a list of Holidays for any country, sub-region, and range of years:

create or replace function holidays(
country_region string, years array
)
returns table(day date, holiday string)
language python
runtime_version = '3.8'
imports=('@~/holidays.zip')
packages = ('python-dateutil', 'hijri-converter', 'convertdate', 'korean_lunar_calendar')
handler='X'
as
$$
from datetime import date
import holidays
class X:
def __init__(self):
pass
def process(self, country_region, years):
parsed_country_region = country_region.split('-')
country = parsed_country_region[0]
region = (
parsed_country_region[1]
if len(parsed_country_region) > 1
else None
)
range_years = range(min(years), 1 + max(years))
holidays_dict = holidays.country_holidays(
country
, subdiv=region
, years=range_years
)
return(((k, v) for k, v in holidays_dict.items()))
def end_partition(self):
pass
$$;

select * from table(holidays('US', [2022]));
select * from table(holidays('US-CA', [2023]));
select * from table(holidays('US', [2010, 2030]));
select * from table(holidays('Chile', [2007]));
select * from table(holidays('UnitedKingdom', [2007]));

Some design notes:

  • The function receives a country code (like ‘US’ or ‘USA’), or full name (like ‘UnitedKingdom’ or ‘Chile’), and an array of years to cover. The country code can also specify a sub-region using a dash.
  • Find the full list of covered countries and sub-regions on GitHub.
  • python-holidays wants the sub-region as an additional parameter, but for SQL usability I decided to mix both with a - (like in ‘US-CA’), and split them manually in the Python UDF.
  • python-holidays wants each year to generate in the input array, but for SQL usability I decided to look at the min/max year in the array and generate a range of years in between.
  • python-holidays also supports a Holiday calendar for the European Central Bank and the New York Stock Exchange — but I have not added support for them on the UDF above (yet).
  • At the end of process() the UDF returns a generator that Snowflake will use to output the desired rows.
  • The most efficient use of the above tabular UDF would be to create a table with its results and then use the static results thereafter.

A simple UDF: is_holiday()

Instead of generating a table of Holidays, we can also create a UDF that simply returns if a date is a Holiday or not:

create or replace function is_holiday(
date date, country_region string
)
returns boolean
language python
runtime_version = '3.8'
packages = ('holidays')
imports=('@~/holidays.zip')
handler='x'
as $$
import holidays
def x(date, country_region):
parsed_country_region = country_region.split('-')
country = parsed_country_region[0]
region = (
parsed_country_region[1]
if len(parsed_country_region) > 1
else None
)
return date in holidays.country_holidays(country, subdiv=region)
$$;
select is_holiday('2021-01-01', 'US')
# TRUE

This function follows the same principles as the tabular UDF above to get a ‘country-region’ as input.

Next steps

Check these posts from Dash Desai and Miles Adkins with details on how to use GitHub Actions to deploy custom Python code to Snowflake — and solve dependency issues:

Want more?

I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter and LinkedIn. And subscribe to reddit.com/r/snowflake for the most interesting Snowflake news.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Felipe Hoffa

Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.