The Deluge of Data

Platform & Stream
Platform & Stream
Published in
8 min readOct 12, 2023

Guest post: Craig Crafton

As the owner and operator of a small label and publishing company, perhaps one of the unknown aspects of the modern music business is the deluge of data that must be managed.

Granted, having this data is great, knowing when and where a song is streamed gives precise information into audience demographics, geographic areas, performance on specific streaming services, and songs being used in User Generated Content, each an insight into what and how music is finding an audience.

This information can be used to target advertising, develop touring strategies, seek partnerships with brands and pitch sync proposals targeting audiences.

But challenge is that labels and publishers are now faced with a deluge of data that must be processed and understood to pay royalties. Even for the smallest of labels, it is a cumbersome task — hence, the array of vendors marketing various types of services to perform this “back office” task.

Sources of Data

For a label that uses an aggregator to deliver sound recordings to streaming services, the aggregator is key. Aggregators may pay royalties in bulk, and then offer a separate set of data that identifies streams (performances) by artist. Some aggregators may breakdown the royalties via contributors (such as band members or producers) based on information provided by the label.

Still others may go further and provide song specific information about streams from a particular platform, a geographic location (typically by city), other listener demographics, such as age, sexual identity or even general income bracket.

And of course, data on specific playlist performances can identify similar artists, from which a community or cluster fanbases can grow into a musical movement. But the aggregator is getting data from the dozens of global streaming services, who all report their streaming activity in different formats using different terminology. Even an aggregator with great service can only go so far harmonizing the data, and this is only for the sound recording side of the business.

Data regarding streaming use of compositions is even more opaque. Aggregators do not live in the world of compositions and music publishing, except to collect songwriter names, typically without any other identifying information, such as IPI or CAE number, nor affiliated publisher, nor ISWC (International Standard Writers Code, an alpha-numeric number for each unique composition assigned by the PROs) leaving the music publishing side of the business to reconcile that information and pay the appropriate parties, currently the “black box” or as I term it, the black hole of the industry.

Music Publishers must compile and process data from their affiliated PRO (ASCAP or BMI), the Harry Fox Agency (HFA), the Mechanical Licensing Collective (MLC) and sync uses and direct licenses. Only recently, with the Music Modernization Act, did the digital streaming providers (DSPs but I like to just refer to them as the Streamers), begin to care about this data.

My understanding is that the Streamers report composition use in a bulk, unsegregated manner to the PROs and MLC, leaving it up those organizations to process and delineate which compositions got performed how many times and who should be paid for it (hence, the black hole).

Let’s look at the different involved organizations and the data they report. I delineate between the Sound Recording side of the business and the Composition side of the business:

Sound Recordings

Aggregators aka Digital Distributors — These organizations, such as Distrokid, The Orchard, CD Baby, TuneCore, Symphonic and many more, ingest Sound Recordings from independent creators, labels, unsigned artists and AI music generators (i.e. Fruits Music) and distribute the releases to the Streamers (major labels have direct distribution but many significantly sized indies use a third-party aggregator). The ingestion data is typical Artist, Song Title, ISRC (international standard recording code), Composer information and not much more (some Aggregators create a unique UPC for that release). Aggregators then collect payout from Streamers and re-distribute to the party that uploaded (label or artist).

The data provided by Aggregators, again compiled from the Streamers, is typically:

Number of Streams by DSP
Number of Streams by Artist
Number of Streams by release/UPC
Number of streams by ISRC
Streaming length (which impacts payout by some DSPs)
Performing artist name
Period (which can be different for DSPs because they report differently)
Geographic Location of streams
Royalty Rate — usually in some fractional percentage of an unknown factor
UGC uses in social media or Youtube
Royalty rate for UGC usage, which is different for each service
Deductions owed to the Aggregator (for other services like marketing)
Total Royalty Rate average
Royalty Amount
Gross Amount
Commission Rate
Commission Amount
Foreign Currency conversion factor (many DSPs are not located in the US)
Payable Amount
And in some cases the payout is split by performing artist member, or entitled party (such as managers)

Or it all may be very broad and generic, depending on the aggregator’s service. Think of it in the physical context like getting a detailed sales report from Best Buy, Tower and Wal-Mart, each providing a differently formatted report. Harmonizing that, would be a similar challenge, but with a per song volume.

Sound Exchange — Collects via a statutory blanket license from certain non-interactive digital services (mainly Sirius, Pandora and Music Choice (cable tv)) for digital performances of sound recordings. Sound Exchange collects data from these services and pays, statutorily to the Sound Recording Copyright Owner, Recording Artist, and Side Artists (which includes producers) for the digital use of the sound recording. The Sound Exchange report contains typical entries for artist name, track name, album name, label name, identifying information such as ISRC, Release UPC, alternative identifying information, and then various performance information, such as broad case date, number of performances, and payment information via several break outs, and other categories specific to Sound Exchange.

Compositions

On the composition side, the PROs, ASCAP and BMI (and the much smaller SESAC and GMR), play an integral role in focusing on the data needed to identify the composition performed, the writers and publishers, and getting those parties paid their proper royalties. ASCAP and BMI may be more sophisticated in this space, as they historically had this issue. The PROs modern struggle is getting effective data from the Streamers, and the Streamers blame ingestion by the Aggregators (and to be fair, I personally know that some Aggregators did not even request songwriter info until recently). The other new development for the PROs is the volume driven by the large number of new composers publishing music, as both PROs have relaxed their membership requirements so that any person can sign up as a composer.

In short, the data on composition plays in the streaming world has greatly improved, and as a result become cumbersome. Typical columns on reports from ASCAP include time period of report, ASCAP publisher, sub-publisher and writer identifying data, such as ISWC, performance dates, type, territory, services, which includes a breakdown by streaming service, film or television performance, duration, number of plays, recording artist and music genre. An ASCAP report might also contain information about the ASCAP survey that identified the performance, legal identifying information for parties entitled to royalties, writer or publisher shares, and various credits and adjustments made by ASCAP.

Mechanical Licenses Collective — The newest player, the MLC, was created by the Music Modernization Act, and is funded by the Streamers. The MLC issues a statutory blanket license to Streamers and collects the mechanical royalty from Streamers and remits to music publishers. The MLC obtains data regarding the performance of compositions from the Streamers.

Again, this data comes in a myriad of formats and terminology. The MLC has done a great job of harmonizing this data and providing it to the music publishers that control the compositions. The publishers can then sort and process the data to determine which song writers are owed what royalties.

The MLC reports typically contain information regarding the MLC writer and sub-publisher and administrator identifying information, such as ISWC, territory, use period and type, streaming service identification and detailed information the number of streams, rate owed by various streaming services (which varies considerably) consisting of many columns of data, and royalty and payment information.

The MLC data of the composition performance and the aggregator data regarding performance of the sound recording are not going to be consistent (for a song with one performer and one songwriter). The composition information being provided to aggregators is typically from labels, and the concomitant information provided to the PROs is from publishers, and labels and publishers may not have the same information.

That is the crux of the problem, too many different sources and different formats, and data that should be the same (or at least similar) can be wildly disparate. With so many variables you can see how data can become corrupted to the disadvantage of labels and publishers.

Harry Fox Agency — Lastly, the Harry Fox Agency (HFA) issues mechanical licenses and collects and remits mechanical royalties for physical reproduction of compositions. Most large publishers work directly with labels for cover songs; hence HFA tends to work generally with indies, unsigned artists doing cover songs and European labels selling compact discs. HFA also still has role for downloads sold by Apple Music (the old Itunes), Beatport, Traxsource and high-quality digital download services (Qobuz and Hi-Fi).

The general columns for HFA reports include period, territory, physical configuration, licensee, composition and administrator information, composition identification data, such as ISWC, title and performing artist, writer identifying data, various internal HFA codes and detailed information about royalty rates, units manufactured, downloads, and calculation of royalties owed. If HFA is authorized to collect for foreign physical reproduction or usage of a composition, additional data fields will be included.

Other entities like Song Trust and some aggregators offer publishing administration services, wherein they register with the PROs, MLC and HFA, manage the data and collect and remit royalties for a fee.

In sum, in addition to identifying, financing and developing artists, and recording and marketing sound recordings, for labels a strong data processing and analysis skillset is necessary, as even a modest sized label likely has many thousands of spreadsheet lines of information process.

And for music publishers, the tracking of composition usage, collecting associated royalties, calculating and paying songwriters has become a massive undertaking.

Together the deluge of data allows more precise and actionable information, but only for those who can effectively manage the data.

** By Craig Crafton **

Contact: craig.crafton@gmail.com

--

--