Be A Data Maestro, Part II

More Lennon/Simon & Garfunkel, Less Eponine/Faith No More

Pie and Donut Analytics
Away From Towards Data Science
4 min readSep 19, 2021

--

In Part I of this series, I gave some background on the motivation behind this project and also began to walk you through putting my database together. I briefly touched on the need for data cleaning (e.g. “Simon and Garfunkel”<>”Simon & Garfunkel”<>”Simon And Garfunkel”) and having a critical mass of data points to work with. In this article, I will talk about devising interesting metrics, with assists from Lea Salonga and Mike Patton.

How to replace a comma followed by a space with a carriage return LIKE A BOSS

My first new metric — the “Come Together” Score

If you are a musician, I am guessing that unless you want to be a Nowhere Man making all his nowhere plans for nobody, you will want as many people as possible to hear and enjoy your music. Thus, I invented my soon-to-be-patented CTS measure which will be calculated for every respondent in my data gathering phase, ranging from a low of On My Own (you only favorited a total of ONE artist, AND no other respondents cared about that artist) to a perfect score of Come Together (all other respondents have also favorited each and every one of your favorited artists).

Les Miz “On My Own” = BAD (only in this specific context!), Beatles “Come Together” = GOOD (in absolutely all contexts)

My nephew Chaser loves Green Day. Let’s say that Chaser and I are the only two people who contributed data to my dataset. (Technically Chase didn’t directly participate since I had to get that info from his mom. Oh you Gen Z whippersnapper!) Green Day is the only band that Chase favorited, according to my cousin.

Let’s also say that for simplicity purposes, I only favorited one band — Manic Street Preachers (who actually ARE my most favorite band in the world if I could only pick one. Obviously though, I have scores more bands/artists/musicals that I listed).

I guarantee Chase has never heard of the Manics in his life, while I actually like and have even paid to see Green Day in concert during their Dookie era (did not list them among my favorites, however). So, the raw Chase-Green_Day CTS = 3 points (2 points because Green Day is Chase’s favorite band, plus 1 point because someone else also likes that band). The raw Pie_and_Donut-Manic_Street_Preachers CTS = 2 points (2 points because they’re my favorite, plus 0 points because no one else has even heard of them).

RIP Richey Edwards :’(

My second new metric — the “Bridge Over Troubled Water” Score

Next, I am also surmising that musicians would prefer to have their listeners span people of all genders, geographies, ages, and races. As Bob Marley would say, “One Love/One Heart/Let’s get together and feel all right.” Therefore, I derived another metric (also soon-to-be-copyrighted) that I will call the BOTWS, calculated for each artist named in my data collection phase, that ranges from a low of Falling to Pieces (your fans are exclusively concentrated within a very niche splinter group) to a perfect score of Bridge Over Troubled Water (EVERYONE all around the world, young and old, girls and boys, gay and straight loves you. As Michael Jackson would say, “It don’t matter if you’re Black or White” — you are an artist who is universally adored). Not sure why a musician would not want to have as high a BOTWS as possible.

Wait until Part III drops!

I am keeping my algorithms and methodology completely confidential and proprietary at this point, but rest assured I am hard at work doing all the necessary coding and trying out slick ways to do the data visualization. For the next article in this series, I plan on sharing some initial results and formulating some possible business use cases, so stay tuned!

--

--