Analyzing genres of artists in LastFM dataset
By MSXH (Mario Becerra, Saif Ismail Hameed, Xian Ji, Huijing Zheng)
On our previous blog post, we described the data we’re going to be working with. We mentioned that we are using a dataset about the listening behavior of 992 users of the website LastFM, and that we could enrich this dataset using Spotify’s API. This blog post shortly describes the process of using Spotify’s API to get the genres associated with each artist.
One can use the API to find the genres associated to each artist. For example, we can call the API to find information about The Doors, and we would get the following result: acid rock,album rock,classic rock,piano rock,psychedelic rock,rock. We do this iteratively for each of the 176 thousand artists in our dataset. In the end, the API could only find information about 22 thousand artists. This is only 12.5% of the artists, but this covers 76% of all the playbacks in our dataset.
Now, with the genres associated to each artist, we can see which are the most popular ones among the users in our dataset. The following plots show the 50 most popular genres, ranked in two different ways:
- one counts the number of total playbacks of each genre, and
- another that counts how many different users listened to a song of an artist and their associated genres.
The first plot shows that the most popular genres are rock related, with some elements of pop and electronica.
The second plot is in agreement with the first plot, although electronica is ranked much higher in this scale than the previous.
There is more information that can be extracted using the Spotify API, such as duration in seconds of a particular song, its key, and other metrics related to the rythm of each song. We intend to extract more information and show the results in future blog posts.