How Broken Metadata Affects the Music Industry (And What We Can Do About It)?

Dmitry Pastukhov
Soundcharts
Published in
13 min readAug 13, 2019

--

Originally published at soundcharts.com on July 16, 2019.

As a big data company tapping into dozens of data sources to build the data analytics tool of the music industry, Soundcharts deals with corrupted metadata every day. Over the years, we’ve developed several algorithms, making sure that our database is as clean as it can be, but even now we have to keep up some form of manual maintenance. Soundcharts’ team learned how broken the music metadata can be the hard way — by managing a database of 2M+ artists.

The fact is that metadata issues cause a lot of damage to the industry. Compromised metadata hurts the user experience on the streaming services, cuts off a portion of publishing revenues and robs songwriters out of the well-deserved credit — and that’s the beginning. But let’s start with the basics:

What is Music Metadata?

In general, the term “metadata” means the data that describes other, more complex classes of data. Attaching text tags to compound content (like video or images) is necessary to make that content accessible to users and digital algorithms. The same goes for music: every song has hundreds of text meta tags attached to it — from artist name and release date to associated genres and songwriters’ credits.

Metadata might sound like an insignificant little thing at first but consider this. Every time a user searches for a song on Spotify; every time a BMI attributes performance royalties; every time Pandora’s algorithm queues up a song — metadata is at play. It’s the oil that makes the cogs of the industry spin.

3 Types of Music Metadata

Generally, music metadata can be divided into three main types:

1. Descriptive Metadata

Descriptive Metadata details the contents of the recording, including song title, release date, performing artist, cover art, main genre and so on. It has about a million different applications. Descriptive metadata is used every time someone needs to query, organize, sort or present the music — whether it’s to put together an artist page on Spotify or identify and attribute a radio spin.

Descriptive metadata issues are the most visible to the end-user. If you’ve been using any streaming service long enough, chances are you’ve stumbled upon the errors resulting from corrupted descriptions. Think of multiple artist’s songs mashed on the same Spotify artist page, compound artist pages, misspelled song names, mixed up release dates — all those are consequences of compromised descriptive metadata. Such issues cause a great deal of confusion to the consumer — but that’s just the tip of the iceberg.

2. Ownership/Performing Rights Metadata

Whether we’re talking digital streams, airplay or movie synch, numerous parties will share the revenue, from performing artists to producers and songwriters. Hence the need for the ownership metadata, specifying the contractual agreements behind the release for the purposes of royalty calculation. Ownership metadata is there to make sure that each and every side is remunerated accordingly.

Given the complex nature of splits behind most of the tracks (as well as differences in legislation across the globe) getting the consumer’s dollar divvied up correctly is no small feat in itself. Now, if we add missing or inconsistent metadata in the mix, the issue becomes ten times more complicated. Ownership metadata issues hit where it hurts the most: a human error here, a database glitch there — and a songwriter can miss out on tens of thousands of dollars.

However, that’s just part of the problem. The artist credits are also the primary way for songwriters, session musicians, and engineers to make themselves visible to the music industry, a kind of “b2b” promotion space. Corrupted ownership metadata robs musicians of both money and visibility.

3. Recommendation Metadata

The first two types of metadata tags are objective — there’s just one real song name, and just one list of song’s credits. Recommendation metadata is different. In its core, it consists of subjective tags whose goal is to reflect the contents of the recording and describe how it sounds. Mood labels, generated genre tags, song similarity scores — recommendation metadata is used to make a meaningful connection between tracks and power the recommendation engines.

Of course, other types of metadata can be used to enhance music discovery as well. The release dates can help you detect the music from the same era, and matching producer’s name can help find similar recordings. However, the critical distinction is the origin of recommendation metadata.

Discovery is a huge differentiator between the streaming services. So recommendation metadata tags are usually proprietary data that doesn’t travel the industry as descriptive and ownership tags do. Instead, each platform will have its own approach to recommendation metadata generation and its own database behind the recommendation algorithm. So, if description and ownership metadata is created on the artist end, the recommendation meta is produced by the DSPs (or their affiliates).

For example, Pandora takes a human classification approach with it’s Music Genome Project. Spotify, on the other hand, employes a mix of user-generated data and The Echo Nest discovery meta tags, produced through the combination of machine learning and human curation. If you want to get a sneak peek into recommendation meta tags structure of The Echo Nest/Spotify and get a general idea of how recommendation meta looks like, check out Organize Your Music project.

Discovery metadata is the fastest developing subset of the entire landscape. The new tech is pushing the borders of discovery, calling for new solutions and approaches to recommendation.

Just think about how smart speakers will change the way we access and discover music. Voice-mediated music consumption turns users from structured text queries to amorphous requests along the lines of “Alexa, play something I like”. That brings a new challenge to the recommendation engines of streaming platforms. It won’t be enough to find similar songs and generate radio-like song queues. Streaming platforms will have to figure out what is the best song to play to that exact person at that exact time. The way they approach this challenge will affect the livelihood of thousands of artists and music pros.

Compound Artist Errors in Action

The Problems of Music Metadata

At this point, a curious reader might ask: but if the music metadata is so essential for the music business, why haven’t we fixed it yet? Why songwriters are still losing money and why iTunes database is full of compound artist’s errors? Well, the thing is that the music metadata management system was always dragging behind the audio distribution formats.

For example, when CD first came about, it didn’t allow for any tags apart from the basic descriptive data — CD cases played the role of metadata attachment. Then, Napster took over, and the chaos followed with it. The MP3 files ripped from CDs had next to no metadata attached, and even less made it over the P2P network, creating a massive archive of poorly tagged recordings.

1. Lack of Database Standardization

Then, the digital music entered the stage and replaced the physical recording formats. All the sides of the industry began to store and exchange data, but back in the day, no one saw the need for unified metadata standards. All the different digital storefronts, labels, publishers, PROs and distributors set up their own databases — and the streaming services followed in their footsteps.

Up to this date, there’s no unified database structure. Metadata runs through an unstandardized intertwinement of databases across the industry: from labels to distributors, from distributors to DSPs, from DSPs to PROs, from PROs to publishers.

All those parties are exchanging data, but the columns and fields of their databases don’t always match up. Imagine that a database receives a value in the field “Back Vocalist” — when its own column for that is called “Back Vocals”. Algorithms won’t be able to make that match (unless there’s a specific match rule) and in 99% of the cases, this data will just get scraped. A big chunk of metadata gets lost on its way through the music data chain.

Furthermore, for each music company, there’s rarely a single database. Instead, the data is stored across multiple internal sources in different formats — and so it needs to be adjusted and validated to set up a proper exchange with external databases.

The current metadata management system was created in the wake of digital music when no one really knew how the landscape is going to develop. Then, data production grew exponentially. Now, 20,000 songs are released every day and pushed through the intricate system of not-all-that-compatible databases — spawning thousands of errors.

Don’t worry, Spotify is not any better

2. Multiplicity of the Music Data.

The problem is not even that there are 20 thousand songs. It’s also about the fact that those songs can be different variations of the same musical work. Let’s get a bit technical for a moment. Every song is structured through three layers of abstraction:

  1. A musical work or composition — a result of the creative thought process of songwriters and producers, bread and butter of music publishers.
  2. A sound recording of the musical work, produced and recorded by performing artists. Sound recording is a particular expression of the music work
  3. A release — a specific, packaged manifestation of the sound recording.

So, it all starts with songwriters and composers creating a musical work. Then, that composition can be expressed in a hundred different ways — think of cover versions, remixes, radio edits, and so forth. Furthermore, each of these sound recordings can be released as a single, a part of an album, a part of a deluxe edition, a part of a compilation, etc.

In the end, a single composition can create hundreds (if not thousands) of separate metadata entities, which hugely complicates the landscape. Music companies need to match all those different layers of abstraction. For example, if ASCAP gets a report of a radio spin for a specific release, it needs to pair it with the underlying composition to locate songwriters.

3. Shortfallings of Music ID Standard

Based on ISRC alone, you won’t be able to tell what is the original musical work behind that particular recording. It doesn’t allow to aggregate the entries to the higher level of abstraction to compile all the versions of the same track or composition. The limits of ISRC standard make it really hard for music companies to fix broken metadata. To make sense of the incoming data, the industry players have to rely on descriptive metadata tags to match the ISRC with other persistent IDs, like UPC for releases or ISWC for compositions. That results in all sorts of errors, duplicates, and conflicts across the music data chain.

There were several attempts to create a global music reference database, but up to this date, there is no ultimate source of truth that would allow resolving the metadata conflicts. Now, the most notable public music databases are MusicBrainz and Discogsopen-source platforms and IFPI’s ISRC code catalog — but, unfortunately, all of them are far from complete.

The shortfallings of the ID system mean that music companies have to go through hoops every time they encounter the metadata error. Trying to connect the dots by cross-referencing the databases that are full of inconsistencies themselves is a daily routine of music data management. That is, if the company cares enough.

Spotify Song Credits Section Highlights the Ownership Metadata Issues

4. Human Errors

Last, but not least, we have what some would call the weakest link of any system. Most of the descriptive and ownership metadata is created and filled in manually. Considering the scale, that inevitably leads to all sorts of typos, misspelled names, titles, release dates — or even straight-up missing data.

Take the ownership data, for example. Song’s credits can become extremely complicated, with dozens of different songwriters, engineers, session musicians and producers working on the same release. At the same time, the deadlines won’t wait — and so the ownership agreements and splits often get overlooked when the team is trying to push out the new track on time. Quite often, splits will be decided post-factum — and once the song is already out, it’s extremely hard to add or edit the meta.

All those different factors, from human errors and database incompatibility to flawed ID standards and multi-sided nature of music copyrights, create the grim reality of the modern music metadata. The backbone of the music industry is perhaps the biggest mess the data-world have ever seen.

How Do We Fix the Metadata?

The current lose-lose situation is calling for a change. Clean metadata could help a session musician get the next gig, pay the songwriter’s rent, optimize user experience for streaming services — and save millions to the industry along the way. However, there’s no clear answer to how do we fix the problem. Don’t get too pessimistic, although — there’s several companies, initiatives, and organizations working towards a better system.

Metadata Clean Up, Management and Administration Solutions

The first type of metadata companies is working towards putting together music databases and then cleaning, mending, and expanding the meta tags. Companies like Gracenote, Musicstory and, to some extent, The Echo Nest are the metadata providers for various DSPs across the industry. Such companies are primarily concerned with descriptive and recommendation metadata. They use a mix of metadata cleaning algorithms and audio-recognition tech to empower search, playlisting, and music discovery while ensuring correct display for the digital storefronts.

On the other side of the specter, VivaData and TuneRegistry are working out solutions for independent record labels and artists. They aim to optimize internal metadata management and streamline outgoing data flows for the foundation of the music data system.

However, all those companies are treating the symptoms, and not the cause of the issue. Don’t get me wrong, it’s vital to try and clean the existing mess — but that won’t solve the systematic problems.

New Database Standards

The most visible player in that space is the DDEX, an international organization developing and promoting new data standards and protocols to optimize the digital data chain. Offering solutions covering the entirety of the music data system, DDEX has already made significant progress, counting some of the biggest names in the industry as its members. DDEX standards aim to facilitate metadata management in the studio, harmonize the metadata transfers between content owners and DSPs and much, much more.

Essentially, the goal of the organization is to build a full-circle pipe for the music metadata, from the point where it’s created to the final destination. Putting in place standard exchange protocols for music metadata could potentially cleanse the industry from thousands of incompatibility errors. However, while the DDEX initiatives can help develop a better system, that won’t fix all of the metadata issues.

Not to sound cliché, but when it comes to fixing the metadata, you have to start with yourself. A considerable portion of errors is due to the lack of awareness amongst the music professionals — which is part of the reason we wrote this article in the first place.

What you can do to help fix the music metadata?

The simple rule of thumb is to make sure that all of the song’s metadata is adequately filled out and verified before the song (or album) release. That’s not as easy as it sounds. Here’s some advice to help you make sure that you won’t contribute to the pile of corrupted music metadata:

1. Keep Track of the Metadata From the Get-Go

Sound Credit and Auddly’s Creator Credits (the feature was still in development at the moment of writing this article) can help you there. Those solutions allow embedding credits and other metadata directly into the DAW-files that fly around the studio. That way you can keep credits in the same place and maintain the consistent record of all song versions and collaborators.

2. Finalize Agreements and Define Splits Before Song Leaves the Studio

Music rights tend to get very complicated, and the rush to meet the release deadline can often leave the ownership metadata uncomplete. However, incomplete ownership metadata can mean that some, or even all of the collaborators will miss out on the paycheck completely. To make the contractual side of things easier, consider using Splits, a free application created to track and manage a song’s collaborators and, well… splits.

3. Make Sure That the Metadata is Filled Correctly

The typos might seem insignificant, but they really have an impact. Track outline will be used to make matches in the database, and so corrupted descriptive metadata has a tendency of breaking things. Be sure to double- and triple-check the song’s metadata before you send it out — or set up a two-step vetting system. Once the song is out there, fixing the typos will become very problematic.

4. Follow Metadata Guidelines

You make sure that not only the contents but also the formats are correct, you can use metadata guidelines. Stick with the distributor’s guide — most of them are easy to follow. If the distributor’s instructions don’t have all the answers, check in with the general guidelines like the one provided by the Music Business Association.

5. Spread the Word

Following those steps, of course, won’t fix all the metadata issues in the industry. The problem itself is too complicated, and we can only solve it if the entire music industry is on the same page. In that sense, the first step is to raise awareness amongst music professionals.
Metadata is at the core of the music industry, and right now it’s broken. Musicians miss out on royalties. The songwriters and engineers don’t get the credit they deserve. The streaming services have developed algorithms to make sure that the top of the catalog looks clean, but once you take a deep dive on the long tail, all sorts of errors start falling through the cracks. We need to start moving towards a better system.

--

--

Dmitry Pastukhov
Soundcharts

Music/Data/Marketing/Branding. Sergey Kuryokhin is my spiritual animal