Description of the project

By Mario Becerra, Mohammed Saif Ismail Hameed, Xian Ji, Huijing Zheng

Mario Becerra
3 min readMar 27, 2020

This post will talk about the project for our “Data Visualization in Data Science” course.

Background information on what the data is about

The data is about the listening behavior of 992 users of the website LastFM. This website has each user’s listening behavior, either from Internet radio stations, or the user’s computer or portable music devices. The data ranges from February 14th 2008 to September 29th 2013, and has the timestamp in which each user listened to each song.

Technical description of the data

The data consists of two files:

  1. The first one has information about each user. Particularly user id, gender, age, country, and date in which the user was registered. This file has 992 lines.
  2. The second one has the streaming information about each user. The columns are user id, timestamp (DMY HMS), artist id, artist name, song name. This file has 19,150,868 lines.

Additionally, using the artist and song information jointly with the Spotify API and other sources, it is possible to get information about the artists and the songs. For the artists it is possible to know the associated genres, nationality, and popularity. While for songs, it is possible to know their duration, release date, and lyrics.

Spotify also provides information about other metrics such as key, time signature, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, and tempo. For more information about these metrics visit https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-features/ .

Research Questions

  1. What are the continental differences in frequency of use per user? Such as minutes per day or other metric.
  2. How does song popularity behavior change over time in a particular continent?
  3. How does listening behavior change over time for different genders?
  4. What are some of the synergistic and antagonistic relationships between artists and genres?
  5. What are the keyword preferences by age of the user?
  6. What are the weekend and weekday differences in listening time for a group of top 20 artists?
  7. Popularity of artists for different users.
  8. Number of accounts opened during festive seasons and off season? (marketing/advertising related)
  9. What are the continental differences between morning and evening listening periods (minutes/per day)?
  10. Who are the most regular and least regular customers? That is, who used the app consistently every day or week, and who had a more erratic behavior.

Preliminary look at the data

The following images show what the data looks like and some plots summarizing user behavior.

User data: 992 data points
Streaming data: 19,150,868 data points
Number of users by country
Table of streaming activity for each artist
Plot of streaming activity for each artist

--

--