Analyzing the regularity of users in the LastFM dataset

By MSXH (Mario Becerra, Saif Ismail Hameed, Xian Ji, Huijing Zheng)

Mario Becerra
2 min readApr 8, 2020

In our first blog post, we described the data we’re going to be working with, and what we plan to analyze. In our second blog post we described the process of using Spotify’s API to get the genres associated with each artist.

Today, we’ll analyze how regular users are in their daily listening habits. We will use the ideas and model presented in this paper, which has been implemented in an R package called called BTYDplus.

The model is a Bayesian multilevel model that assumes that the time between each use of the app is Erlang distributed with a parameter k. A regular user will have higher values of k associated to them, while more clumpy users will have lower values. That is, a user that uses the app every three days consistently, or a user that uses it every day, will have higher values of k; while users that use the app for several days in a row and then discontinue its use for a while, will have lower values of k. For more details on this, see the paper or the package documentation.

The following plots show examples of the listening behavior of the users. On the left, we show the 30 users with the highest values of estimated k, and on the right the 30 with the lowest value. We also show the k value of each. It is clear that users on the left have a more regular pattern than the users on the right.

In the following plot we show the listening behavior of the most regular user, with a k value of 1.19. This particular user used the app every day from 2009–03–07 to 2009–04–29, that is why it has the highest value, since the pattern is perfectly regular.

The overall distribution of the estimated k values can be seen in the following plot. Most of the values are between 0.16 and 0.4.

--

--

Mario Becerra

Data scientist. PhD student in statistics at KU Leuven.