Exploring Similarities Between Artists by Their Track’s Audio Traits

My Tran
INST414: Data Science Techniques
4 min readMar 11, 2024

When your friend asks you to recommend them a music artist based on an artist they already like, you will most likely recommend someone within the same genre. If the given artist is a rap-focused hip-hop artist, we would oftentimes suggest other artists that mainly rap and do hip-hop. What we do not do is recommend based on other traits within an artist’s music. This can include the beats per minute, the energy, or even the length of the music. Its more difficult to really gauge similarities between audio because there are just so many traits and pieces that go into music, plus you don’t really listen to music and count the amount of beats per minute. This made me wonder, how do we compare artists in numerical-based traits?

Music streaming platforms like Spotify or Apple Music are always interested in analytics that help them recommend their users more music. Given how untapped this may be, stakeholders may be curious about identifying artists whose songs exhibit similar traits to popular artists. People who listen to those popular artists might get recommendations similar in genre, but not in other traits. By understanding the similarities between artists based on their track’s audio traits, these platforms can better tailor their ercommendations to individual user preferences. Informing these recommendation decisions will help stakeholders increase user retention and ultimately increase profits.

The type of dataset that would answer this question would include the basic information related to songs, such as the title, artist, genre and year. However, it will also include specific audio-related traits. This dataset comprises of audio statistics of the top 2000 tracks on Spotify from 1956 to 2019 by many renowned artist like Queen and the Beatles. The data is sourced from http://sortyourmusic.playlistmachinery.com/ by @plamere, utilizes Spotipy (Spotify’s API) to extract audio features from tracks based on the Spotify Playlist URIs. The audio attributes that are encompassed includes:

  • Beats per Minute(BPM): The tempo of the song
  • Energy: The energy of a song — the higher the value, the more energetic the song
  • Danceability: The higher the value, the easier it is to dance to this song
  • Loudness: The higher the value, the louder the song
  • Valence: The higher the value, the more positive mood for the song
  • Length: The duration of the song
  • Acoustic: The higher the value, the more acoustic the song is
  • Speechiness: The higher the value, the more spoken words the song contains

These features would be useful to my analysis because these are traits you don’t often use when comparing artists and recommending music.

Before diving into the analysis, I ensured the data integrity and consistency. I handled missing values by turning them into N/A or 0, and removed the columns that were not necessary.

To measure the similarity between artists, I focused on all of the given audio traits extracted from each song. These traits were all numerical such as BPM, energy, and danceability. By normalizing these attributes and computing pairwise cosine similarity scores between the artists, I can quantify the degree of similarity between their music based on those traits and not their genres.

After computing the similarity scores, I identified the top 10 most similar artists to the first three extremely popular musicians I could think of: Beyoncé, Taylor Swift, and the Backstreet Boys. These results offer insights into artists whos music shares similar audio characteristics with these iconic figures, which is a new way to discover artists and give new music recommendations.

The results are as follows:

Top 10 most similar artists to Beyoncé:
First Aid Kit: 0.9817711365169072
OneRepublic: 0.9816519079895775
Journey: 0.9805372165925234
David Guetta: 0.9796908414724502
Lady Antebellum: 0.9796425940456925
The Chainsmokers: 0.9795746870714249
Cher: 0.9794744032662097
George Harrison: 0.9790194555044555
The Fray: 0.9789536040632573
The Cult: 0.9787689272611512

Top 10 most similar artists to Taylor Swift:
Iron Butterfly: 0.9965645706639197
Pharrell Williams: 0.9948758777427686
Patrick Hernandez: 0.9945018592418253
Barry White: 0.9944493418943747
Luis Fonsi: 0.9935181546908086
The Trammps: 0.9934971905778708
The Shadows: 0.9925010998281452
Mud: 0.9921205814439888
Traveling Wilburys: 0.9919629378813106
Gigi D’Agostino: 0.9917956139230645

Top 10 most similar artists to Backstreet Boys:
Jamiroquai: 0.9945499047474489
First Aid Kit: 0.9940147516952118
Boney M.: 0.9936712441463216
Survivor: 0.9931686181098808
OneRepublic: 0.993141972798605
Natasha Bedingfield: 0.9931344502445689
Traveling Wilburys: 0.9928623289543261
Level 42: 0.9926811285842041
Carly Simon: 0.9925078448872253
Calvin Harris: 0.9923173460670365

The results highlited artists whose tracks exhibited similar audio traits to the artists of interest. For example, artists like First Aid Kit, OneRepublic, and David Guetta emerged as the most similar to Beyoncé, Taylor Swift, and the Backstreet Boys respectively. This means that the music First Aid Kit had in the top 2000 has the BPM, energy, danceability, liveness, valence, length, acousticness, speechiness, and popularity most similar to Beyoncé.

While analyzing artist similarities based on audio features can provide some unique and fun insights, there are definitely some limitations and potential biases in the analysis. One key limitation is the reliance on mostly audio features alone, which may overlook other important factors like lyrics, instrumentals, and production. Additionally, the subjectivity of certain audio features, such as danceability, can introduce variability in similarity assessments depending on the listener. Moreover, the dataset is a top 2000 dataset, so it is more biased to popular artists and therefore may still be biased towards mainstream styles. Finally, there is an outlier trait I included in my analysis called ‘popularity’, which does not have anything to do with audio features. To address these limitations, future analyses should consider incorporating additional data sources and gain a more comprehensive understanding of relationships between what goes into a track.

In conclusion, exploring similarities between artists based on their track’s audio traits opens up a new way for music recommendation systems. By incorporating a wider range of features beyond genre to compare artists with, they can provide a more personalized and unique recommendations to users. Moving forward, further research and experimentation with advanced similarity metrics can enhance the effectiveness and accuracy of music recommendation algorithms.

The code I used for this can be found in my Github repository: https://github.com/vitamyon/INST414/blob/74939807912b1093f9b270ec5a389aad8abeddbf/spotifyassignment.ipynb

--

--