Correlating Musical Genre to Geographic Location

Published in

Modeling Music

3 min readMay 25, 2016

by Robert Crimi and Tae Kim

In an effort to develop algorithms to help predict musical genre, it is very important to know which genres are concentrated around specific geographical locations. From these relationships, we may predict whether an artist from a particular region will fall under certain genres as well as whether or not they may be successful. Further, from these predictors, record labels will be able to develop marketing strategies for different regions. As these relationships cover interests from individual curiosity to the music industry, it is important to apply geographical clustering and correlation tests to a large corpus of music.

In our studies, we used the Million Song Dataset in conjunction with the Tagtraum and musiXmatch datasets to do our cluster analysis. This subsetted dataset contained 924 tracks, which is summarized in Figure 1 below:

We plotted all coordinates on a map using ggmap and ggplot2 library in R as shown in Figure 2. You can see that majority of points are in USA and EU. This may be due to the Tagtraum genre annotations being generated by using the beaTunes label-submissions of English-speaking users, as noted in the “Improving genre annotations for the million song dataset” paper.

For geographical clustering, we used a K-means clustering algorithm with 15 centers. We ran the K-means algorithm for 5000 iterations. The results of the clustering is shown in Figure 3.

To correlate genres with the computed geographical cluster, we used a Chi-Squared test written in Python’s Scipy package. This test produced the following results:

Test Statistic: 935.7293

P-Value: 8.4739 e-97

D.O.F: 196

Upon initial glance, these results portray a high dependence between geographical cluster and musical genre. However, upon further investigation, the table of co-occurrences contained a lot of values less than 5. This is an indicator that the results of a Chi-Squared test may be skewed. This co-occurrence table, converted to a frequency table, is shown in Figure 4 below:

As the Chi-Squared results may be skewed due to low values, Figure 4 proves to convey more information about the relationships between genre and location. For example, we can deduce that the probability that an artist from geographical cluster 1 will fall under Rap is 42.9%. We can also tell what clusters have the highest diversity in genres. For example, it appears as though clusters 1, 2, and 14 are the most diverse clusters when it comes to musical genre.

Overall, this research attempts to set the foundation for further exploration. We have mined a musical corpus from the MSD, Tagtraum, and musiXmatch datasets. This dataset will help foster further explorations of relationships between the musical attributes. Furthermore, we have implemented a K-means clustering for geographical locations and a Chi-Squared correlation test. These implementations will help us explore the relationships between other musical attributes.

Correlating Musical Genre to Geographic Location

Written by Robert Crimi