We chose the Nottingham dataset because of its importance and widespread usage in ML composition research; it is copyright free and easy to obtain, it contains full songs with complete annotation, and it is reasonably varied while being stylistically consistent.
We experimented with existing solutions such as neo4j and Graphhopper before deciding to use a custom graph store implementation. Graphhopper is a great project, but the codebase was at the time too specialized to automobile routing. Neo4j is ultimately designed to support mutable graphs, and thus can’t hope to do things as efficiently as a custom immutable implementation. We don’t need to support live editing of route data — updates are batched and the database is regenerated every few weeks. Because our graph is immutable, implementing a custom data store wasn’t too difficult. We use a adjacency list representation of the graph modified so that everything fits in a single array indexed by edgeid. Graphhopper gives a good overview of a similar data structure. The graph store only uses 16 bytes per edge, and 4 bytes per node.