Visual Analysis of Indonesia’s Music Taste using API Spotify and Seaborn

Spotify dataset can be utilized to gain some insights about music taste.

Krisna Renaldi
5 min readSep 6, 2021
Photo by Yvette de Wit on Unsplash

Note: this is the last article of three article series about Spotify. you can read my previous article here:

You can read this article in Bahasa Indonesia version.

In 2003, band called Project Pop released a song titled “Dangdut is the Music of My Country”. If you want to hear it, you can click this link. After 18 years, is dangdut still the music of my country? To answer this, I used Spotify dataset.

Data Acquisition:

I scrapped Spotifychart just like I did in my first series article. For this purpose, I did scrap from 1 Jan 2020 — Q2 2021 (30 June 2021). And luckily for me, Spotify Chart allows all crawler to access all content. You can read the scrap process here.

And for the result, I had 109,199 URLs, 504 artists solo/group, and 1,258 songs (after I cleaned the data).

I used API Spotify to extract song’s meta data and artist’s info. Sign up here as a Spotify developer to get client_id and client_secret.

Fig 1: left: artist json, right: song json

Usually, data is saved in csv format but I saved it in database, MySQL.

Analyze: Spotify Chart

Let us load data with help Python toolkit sql alchemy:

from sqlalchemy import create_engine#connect db
my_conn = create_engine("mysql+mysqlconnector://root:root@localhost:8889/spotifyfinal")
sql = "SELECT position,track_id,track,artist,url,stream, chart_date FROM charts c ORDER by chart_date,position"df = pd.read_sql(sql,my_conn,parse_dates=True)

Top 20 songs based on number of streams.

Fig 2

Now the artist.

Fig 3

And last, top 20 feature song.

Fig 4

Top feature song and top song in stream is not necessarily same.

Les us take a look at percentage between external song and domestic song.

Fig 5

Analyze: The Artist

Data loaded:

sql = "SELECT name,followers,genre,one_genre,popularity,url FROM artists ORDER BY name ASC"df = pd.read_sql(sql,my_conn,parse_dates=True)
Fig 6: I focus on “followers” and “genre” field

I connected artist with number of followers.

Fig 7 there was no local artist

This is the local artist (in range million)/

Fig 8

Every artist brings their own genre. Luckily, Spotify has mapped genre on every artist (Fig 6). This chart genre vs follower.

Fig 9

It seems “dance pop”, “pop”, “k-pop” is the most wanted genre in Indonesia. Now, let us see genre in domestic/local music.

Fig 10

Finally, I found genre “dangdut” with 1.35 million followers compare to “indonesian pop” with 22.26 million followers. Now, it is starting to clear about Indonesia’s music taste.

Analyze: The Song

Spotify gave audio feature to all song as meta-data. This is the description:

Fig 11

The values of all the feature are lies in range 0–1 except “loudness” and “tempo”. I have extracted each song using Spotipy.

from sqlalchemy import create_enginesql = "SELECT track_id,uri,danceability,energy,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo FROM audio_features af ORDER BY id"df_track = pd.read_sql(sql,my_conn,parse_dates=True)
Fig 12

Then I plot all values to see the correlation.

import seaborn as snssns.pairplot(df_track)
Fig 13

Please, forgive me about the image. You could save the image on your local computer and zoom in to see more detail. I noticed several things:

  • “energy” and “valence” are in linear relation.
  • “mode” could be dropped for this analyze.
  • “danceability” ,“energy”, “loudness” have positive correlation.
  • “acousticness” and “energy” have negative correlation and that is make sense.

To get more insight, I took top 100 song based on number of stream like I did in Fig 2 then I calculated average items in audio feature. Lastly, I calculated average items in audio feature for all songs. I plotted on radar chart.

Fig 14

There were almost no difference score between top 100 song and all song. Both had score 0.55 ~ 0.6. This was inline with Fig 9, Indonesia’s Spotify users tend to listen to song with positive ambience (“valence” score ~ 0.6) and dance intend (“danceability” score ~ 0.6). Also, with high energy.

Tips for composers and song producers, please make a song with “danceability”, “valence”, and “energy” score above 0.5 because this is what Indonesia wants!

Conclusion

After analyze the dataset, I concluded several things:

  • The streaming portion of external song is larger than domestic.
  • Band Sheila on 7 has massive fans even they are not release any single/album.
  • Top stream song is not always become the top feature song and vice versa.
  • Indonesia’s Spotify users have a great interest in “dance pop”,”pop”, and “k-pop” genre while for local music genre is “indonesian pop” and “indonesian jazz”.

Final Wrap

Playing with dataset was fun to me and helped me to build a strong foundation how to analyze data. Spotify dataset could not be referred as a valid reason to identify country’s musical taste. As a case for Indonesia, there are some limitation like geographic and economic condition that not every people could access Spotify platform. At least, we could know the current situation of music industry in one country.

As a bonus, I gave you word cloud for several domestic musical genres.

Thank you for your time.

Let data inspire you.

[1] Halimah Tusyakdiah, Exploratory Data Analysis for Top 50 Spotify Songs in Python

[2] Lehak Narnauli, Analyzing the Spotify dataset to gain insights in the music industry

--

--

Krisna Renaldi

Web Developer, love back-end, math, physics and data scientiest