Helping a music publisher efficiently manage over a century’s worth of data
By, Kevin Le
This is a case study of a data management product I developed in Spring 2018 when I was a college freelance developer. Looking back, I would’ve handled some things differently now that I have more knowledge under my belt, but I am grateful for the experience!
The product was officially deployed in June 2018 and remains in use, including after the company was acquired in May 2020.
I worked with a 100+ year old music publisher that handles the rights to songs ranging from classics to modern-day pop hits. At the start of this project, their catalog was equally impressive as it was cumbersome. A single search alone could take several minutes — a major obstacle in a business reliant on royalty payments.
Here’s how I improved the way a music publisher manages song data by implementing a custom and robust data management system.
A music publisher’s primary business revolves around managing a high quantity of songs which continues to increase. In managing song data there are two crucial elements: efficiency and trust. The music publisher’s song management was inefficient as it relied on Windows File Explorer, but they at least trusted that all of their song files were definitely located there. This company did not trust moving and maintaining their over century old catalog to external, often frivolous, third party softwares. Exacerbating all of this is the company’s lack of technical expertise, with only one employee comfortable with SQL and most of their IT outsourced.
How could an efficient song data management system be created while maintaining approachability and trustworthiness for a non-technical user base?
For employees at this music publisher, trust outweighs efficiency. After extensive interviews, I realized that employees at the music publisher did not want a black box solution — they wanted to fully understand the process and be able to easily make modifications at a root level. The music publisher actually already had a much faster search engine in place. Everyone knew about it, but no one used it as it was missing numerous songs, and the process for adding new songs was a lengthy and manual process. So, I needed to make a system that no longer made trust and efficiency a zero-sum game.
I developed a file systems model database system with an accompanying software application written in Python. It supports instantaneous search across the music publisher’s database, various sync and sorting functionalities, file metadata editing, file sharing, and custom tag making. For this case study, I first explain why I used a file systems model and then break down my solution down in terms of the three main functions of the application — Searching, Adding, and Syncing.
File System Database
Normally, I would implement a SQL or NoSQL database system. However, it was important that every employee and every incoming employee at the music publisher could use and maintain the system easily. The music publisher did have a SQL server, but only one person could use it. Every tech system the music publisher had was also maintained by an external IT consulting group.
Thus, I recognized that it was important that the music publisher could feel total autonomy over their database as it is central to all of their operations. Their data, their domain.
The file system model revolves around spreadsheets, which any employee can easily understand and manipulate. At the core of the model is an Excel workbook with multiple sheets. Each sheet represents a different attribute, which are referenced by a master sheet and tied together by unique identification keys. There is also a configuration sheet which allows users to modify the accompanying program modularly. If the music publisher wanted to access the data at a root level, they could feel comfortable doing so as spreadsheets are familiar with them (as opposed to a SQL database system).
This model takes advantage of the music publisher’s pre-existing IT infrastructure. All files already exist in a shared server across all employee computers, which means that changes done to a file syncs up for everyone else. Additionally, numerous security measures are already in place to protect these servers.
I decided to go with a central Excel workbook — referred to as the MDrive Database (for ‘Music Drive Database’) — rather than multiple csv files for ease of use by employees at the music publisher. This sheet essentially acted as a one-stop shop for viewing data and modifying song information. While there is an accompanying program, again it was important that employees could get as low-level as they desired. Additionally, I also realized that, when using text information alone, the Excel workbook would remain at a size that was still capable of being processed relatively quickly (certainly much faster than their previous system).
I also added two other directories of spreadsheets — a Batch directory and a Backup directory. The Backup directory contains copies of the Excel workbook, and these can be made from the accompanying program. The Batch directory contains csv files which are sorted by date and labelled by user. Each of these csv files log desired user changes which can then be synced up to the MDrive Database and to actual file metadata in the MDrive (this is where all of the company’s song mp3s, m4as, wavs, etc. are stored).
In data terms, this is known as a batch processing system, and is one method of tackling the issue of how to handle multiple users modifying a shared database simultaneously. In the context of this song database system, users select all desired csv files that they would like to import into the database, thus importing changes in one large ‘batch’ rather than in real-time.
I decided against doing a real-time update method because I wanted to make every change as deliberate as possible for the user. As the program interfaces directly with their actual song metadata, it is vital that any information added is accurate. However, only employees would know if information is incorrect or not. This allows for the option of a human check (as anyone can open up a csv file). Additionally, as each file is labelled by user, if there are significant mistakes, it is easy to identify who is accountable.
Overall, the music publisher was extremely receptive to this model. It was a data model that they could finally understand and feel control over.
My goal with the accompanying program was to make it both robust and lightweight. A concern with many third party softwares that the music publisher had was that there were too many bells and whistles which were distracting and diminished the user experience. Thus, I went for a system whose UI and UX had no frills.
To design the search input menu, I worked with employees at the music publisher to determine what attributes to include. Beyond ID3 attributes, I wanted to expand on the music publisher’s ability to work with attributes by adding attributes such as TOPIC and LYRICS which do not exist in ID3 data, but are still useful for the music publisher. Again, this ties back to giving users at the music publisher as much control and modularity as possible.
On the more technical side, searches became instantaneous. As stated earlier, searches at the time would take the music publisher several minutes to run. To implement this, I read in all data from the MDrive Database at initialization and store it in pandas dataframes. Such dataframes can be traversed and manipulated very quickly and efficiently.
The search output consists of a table view of all input matches within the stored dataframes. When a user double clicks on any entry, two actions occur — 1.) the Add menu will appear and 2.) the song file will start playing. A simple song player can be manipulated from the Add menu.
Should a satisfactory song be found, users can easily transfer one or multiple files at once by highlighting row(s) in the search output and selecting ‘Transfer Files.’ Here, a temporary directory pop-up is created with copies of the selected songs. These songs can then simply be dragged and dropped to emails, Google Drive, Dropbox, etc. When a transfer is complete, the temporary directory is automatically deleted to save space.
As stated in the previous section, the Single Song Add menu appears when you double click on an entry in the search output. Every change committed through the Single Song Add menu is automatically added to a log file that is created at initialization. These log files are contained in the Batch directory and are the files used for the batch importing process. Again, I wanted to make adding as user-friendly as possible, thus the menu looks almost identical to the search menu for consistency and simplicity.
Sometimes, employees at the music publisher need to add the same data to multiple songs at once (such as contact information). To do this, users simply need to highlight several entries in the Search Output and select ‘Batch Add’ (Batch referring to multiple in this case). Here, a blank menu appears, but changes applied here will be logged for every highlighted song.
Upon termination of the program, users are asked if they would like to keep their log files. This helps act as a check to ensure that only useful log files are added to the Batch directory.
There are two different types of data that can be synced — song file metadata and spreadsheet data. Song file metadata refers to the details that appear when a user views the properties of a song file. Such metadata is limited by the constraint of the file type (e.x. ID3).
Spreadsheet data refers to cell data that is added to the MDrive database in the File System Database. It includes all metadata and also custom tags specified by the music publisher. In other words, attributes are not limited solely to, for example, ID3 data.
Both types of data are modified whenever a Batch Import is called from the Sync Menu. Here, a user is asked to specify files from the Batch directory. Each file contains a set of logged changes from specific users, which are then merged into the spreadsheet data. If any of the logged changes refer to specific song metadata (e.x. ID3 data), then the metadata for that song will be automatically updated. That way, changes happen at a root level too. This is especially useful for when the music publisher needs to transfer songs to clients as clients can view this metadata.
However, syncing is not limited to batch files. It is also possible to sync pre-existing metadata in the MDrive to the MDrive database. Additionally, as new files come in, the Sort Files function allows users to organize files and have the metadata from those files automatically sync up with the MDrive Database. This allows for ease of scalability as the music publisher’s catalog continues to expand.
Ultimately, many changes occurred for the music publisher I worked with. Searches went from taking minutes to being instantaneous. All searches became exhaustive — the music publisher could trust no data was missing because it was tied to their core data. Syncing data was no longer painful and could automatically keep up with the music publisher’s scale without having to rearrange all of the music publisher’s song files. The music publisher could transfer more detailed files to clients efficiently. Employees could easily add vast amounts of data both to a database and to song metadata at the same time — a functionality they never had previously. The music publisher could now feel control over their database. Users could maintain a balance between automation and human deliberation.
Overall, this music publisher now has a modular and approachable system that not only vastly improved the efficiency of their operations, but also added to their capabilities.
Thank you for reading! This is my blog where I share thoughts and experiences. You can learn more about me at my website here.