How to Use Python to Turn iCalendar Data into a CSV of Dates

Leah Cole
5 min readFeb 17, 2021

--

The Problem

In summer 2020, I presented my interns with the task of writing a tutorial with a data ingestion component. We had many talks about appropriate data sources in the effort to find data that were relevant to our tutorial, easy to obtain, had a license appropriate for our use case, and would not introduce bias in a trained Machine Learning model. However, I made a crucial oversight: I failed to consider what would happen if one of those data sources suddenly disappeared.

In early 2021, I discovered that a Kaggle dataset featuring a CSV of United States Federal Holidays from 1966-present was no longer available. I still needed this data for the model training portion of the tutorial I was writing. The good news: this information is public knowledge and was still available in other formats; I just had to figure out a new source and how to convert it to the format my tutorial was expecting.

The Options

  1. Azure has a worldwide public holidays dataset covering multiple countries (including the USA). This option initially looked like it would be the best option (and I’m hoping to incorporate it into some future multi-cloud tutorials) but because of the approvals I would need to use it (I work on Google Cloud Platform), combined with my total inexperience using Azure, it was not a viable option for this tutorial.
  2. A different Kaggle dataset that had the same information and was actually derived from the original dataset I had been using. This option had two drawbacks: it only covered from 2004–2021, and it also came from an individual user, so it had the same risk that the link may stop working at any time
  3. An iCalendar file from the United States Office of Personnel Management website that had all US Federal Holidays from 1997–2021. At first, this option seemed like it was the most ridiculous. It is a random calendar file intended to be used to tell you if a certain day is or is not a federal holiday. It is not at all in the format I needed. It does not contain as much data as the original source (starts in 1997 instead of 1966.) However, this option was the best option for me for two reasons: it came from an official government source, and it had no licensing restrictions. I also did not actually need the data from 1966–1996, so that 30 year omission did not matter. Was this the absolute BEST solution for my use case? Probably not, but no solution is perfect, and this one was absolutely good enough.

Why Would I Ever Want to Do This?

Dear reader, I hope you do not have to do this. I also know that data works in mysterious ways, and sometimes finding the right data for your use case requires some creativity. So, if you find yourself in a conundrum like mine, please proceed.

The Data

Now that I had a valid data source, I needed to convert it from iCalendar format into a CSV file.

The Data in iCalendar Format

iCalendar files have sections that begin with BEGIN:tags and end with END: tags. The VCALENDAR section includes all other sections. In this data, the most important sections are events, which are contained in VEVENT sections.

Within an event, there are two additional fields important for my use case: DTSTART and SUMMARY . DTSTART is the date when the event begins. There is an additional DTEND field, but I ignored that because there are no Federal Holidays in the United States that last longer than one day, so the duration is not relevant. The SUMMARY field is the name of the holiday.

This example (also shown below) is a snippet of the data used in this post (and is additionally used in the tests for this code.) It is in iCalendar format and contains one calendar event — New Year’s Day 1997. Note the VCALENDAR, VEVENT , DTSTART , and SUMMARY tags.

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Telerik Inc.//NONSGML RadScheduler//EN
METHOD:PUBLISH
BEGIN:VEVENT
DTSTART:19970101
DTEND:19970102
UID:20210211T185111Z-05fe6751-ec1d-4532-b3a8-60e48e0eb064
DTSTAMP:20210211T185111Z
SUMMARY:New Year’s Day
DESCRIPTION:
END:VEVENT
END:VCALENDAR

The Data in CSV Format

For my purposes, the only information I needed from the events in this iCalendar file is the date of the holiday (the DTSTARTfield) and the name (the SUMMARY field.) I needed to write code that would convert the previous snippet into the following CSV format:

Date,Holiday
1997-1-1,New Year’s Day

The Conversion

Luckily, there exists a package on PyPI called ics that is specifically made to read and write iCalendar data that respects the original rfc5545 specification, and reading from and writing to CSVs is so common in Python that the functionality is part of the core library. I just needed to figure out how to put these two tools together. I broke the conversion down into three steps:

  1. Use Python I/O functionality to read the original contents of the holiday file and put it into a big long string.
  2. Use the ics library to convert that string into event objects , then create a list of events, where each event is a list containing the date of the holiday and the name.
  3. Write the list of events to a CSV file

Here’s what output of each step would look like on the example snippet:

Step 1: a multi-line string that looks just like the input iCalendar file

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Telerik Inc.//NONSGML RadScheduler//EN
METHOD:PUBLISH
BEGIN:VEVENT
DTSTART:19970101
DTEND:19970102
UID:20210211T185111Z-05fe6751-ec1d-4532-b3a8-60e48e0eb064
DTSTAMP:20210211T185111Z
SUMMARY:New Year’s Day
DESCRIPTION:
END:VEVENT
END:VCALENDAR

Step 2: an object containing the event(s), then a list of events (in this case, the list contains a single event)

{<all-day Event 'New Year’s Day' begin:1997-01-01 end:1997-01-02>}
[['Date', 'Holiday'], ['1997-1-1', 'New Year’s Day']]

Step 3: a CSV with two columns: Date and Holiday

Date,Holiday
1997-1-1,New Year’s Day

Step 4: CELEBRATE. This is exactly what I needed.

The Code

I could have made one long, untested Python script to do this conversion once, but I chose to break up this script into multiple parts and write it as cleanly as possible including tests both as a gift to my future self and to anyone else who may like to use it. The code, tests, and test data can be found in this repository. Please feel free to adapt it for your own iCalendar-to-CSV conversion needs.

Lessons Learned

  • Always consider the availability of your source data and think about how it would affect your use case if it disappeared
  • When unfamiliar code is well-documented, using it is much less frustrating! (thanks, ics !!)
  • Small projects like these are always a good opportunity to practice using new-to-you technologies (Hello, GitHub Actions!)

--

--

Leah Cole

DPE @GCPcloud. @GEDTLP alum, Boston sports fan. Previously @GE_Digital, @MITLL, @CarletonCollege. Currently unapologetically awesome. she/her