Parsing iTunes XML Data with F#
I’m kind of a music nerd and I’m a fan of all kinds of music from Hip-Hop to Classical music. As a developer, I always listen to music while I work. And I strive for the highest music quality and seek out the best listening experience.
I am currently living in Lagos, Nigeria. And as music services go, We don’t have Spotify or Google Play Music. We have only Apple’s iTunes and Microsoft’s Groove Music. That said iTunes is my Go-To service for purchasing and listening to music.
Recently, Nigeria entered into a recession and the Dynamic Currency Conversion (DCC) was banned. Some banks put a $100 limit on debit cards for native dollar transactions while others ban it completely. Apple recently switched to using DCC which means that I can not pay for my iTunes subscription anymore (Talk about having a first-world problem in a third-world country. *laughs*)
All this lead to Apple cancelling my subscription and with it, yanking off tracks from my playlists automatically. However, I was able to extract my original playlist data from one of my systems by not connecting to the internet and then using the export function in iTunes.
I was able to export all of my iTunes data but to put it lightly, it is the most horrendous piece of XML ever created. Doubt it? Let’s take a look shall we?
Explaining all the flaws of this XML format is worthy of it’s own blog post. However, what I will say is that enforcing the order of child elements in XML is (to quote James Gosling) “the work of the Devil”
My Goal was simply to extract all of my playlist information from the XML. That way I could figure out how to migrate to another Music Service or even roll out my own.
Any experienced programmer will tell you that data structures are very important in solving problems. In fact Linus Torvalds once said
I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important.
This fact couldn’t get any clearer after I botched the initial domain model by trying to model the tags rather than the data structures they represented.
For starters, I called the Discriminated Union PList which was wrong because Integer, Date, String… aren’t types of PList. None was my way of describing any Tag I could not understand which is a lazy way of trying to model the Domain. A Domain model should be as exhaustive as possible. Also Key isn’t exactly part of the domain. It isn’t a type of value.
Because of this, coming up with a algorithm to parse the document ended up becoming more difficult than I imagined. Using this domain I had the following implementation for parsing the data.
Suffice to say, I was heading no where fast. However, it had a few clever ideas like using pattern matching on line 13 to pick the child elements in twos. I also tried treating dict and array as special PLists with their own match cases which was just plain wrong.
One of the reasons that I decided to model the elements rather than the domain was because of the complexity of dict and arrays. Dictionaries contain key value pairs. And these values could also be other Dictionaries. The second image really shows this point
That’s when it clicked that Arrays and Dictionaries are simply Values not PLists. Hence the following:
With my data structure now nailed down, I couldn’t think of a way of solving the value parsing problem without thinking of the dictionary/array parsing and vice-versa. I immediately recognized that this was a chicken and egg problem.
In F# you can model this kind of chicken and egg problems using mutually recursive functions. That way, I can delay the implementation of parsing the Dictionary (toDict) and then recursively call (toValue) in it’s implementation. And it worked like a charm!
Line 10 shows where I call toDict whose implementation doesn’t exist yet and then in its implementation on Line 17 I can now call toValue for each of the children elements.
It might seem obvious but I’ll say it anyway. Your choice of data structures and how you design your domain is crucial when writing code in F# (or in any other language). Screw it up, and you will be walking around in circles. Nail it, and your implementation will be concise, straightforward and probably even trivial.
Here is the final Source Code
The full source with a sample xml file is also available on GitHub