Genre Learning: A Machine Learning Approach

Published in

Modeling Music

5 min readMay 26, 2016

Robert Crimi, Calvin Hicks, Charles Luckhardt, and Camille Noufi

In an effort to build a model to classify genre based on musical attributes, this initial research attempts to set the groundwork for genre classification in Python. In this research, we compile a musical corpus from the overlapping songs in the Million Song Dataset, Tagtraum, and musiXmatch datasets. Using this musical corpus, we conducted an initial Principal Component Analysis (PCA), followed by training a Logistic Regression model using Python’s Scikit-Learn package. Using these two methods, our hopes were to generate accurate predictions of genre using a relatively low number of musical features.

In previous posts, we discuss how our musical corpus was compiled and improved. For example, in this post, we discussed how we used a Topic Model to group songs by lyrical topics. In this post, we discussed how we clustered artists by geographical location. These two groupings were added to our feature list in our musical corpus in order to improve our classification model.

We used PCA as an analytical method to identify important dimensions of data. PCA finds which dimensions’ variances best describe the overall variance of the data. Fortunately, the complicated math behind PCA has been implemented in Scikit-Learn. However, when running our musical features through PCA, we did not receive significant results. This was most likely due to our data containing many categorical features. Despite attempts to convert these categorical features to continuous variables, we were not able to see any significant results.

We were more successful with training a Logistic Regression model. Using Logistic Regression in Scikit-Learn, we were able to achieve a recall accuracy of 53% and a testing accuracy of 53%. The code was written in python3 and is readily available on our Github repo. Prediction accuracy per genre and specific prediction can be viewed in Figures 1 and 2:

Our model’s accuracy was likely skewed by the high prevalence of rock music within the dataset. This caused our algorithm to falsely predict rock as the genre for most of the songs. This table also calls the model’s 50% prediction accuracy into question. Since ~50% of the songs in this dataset were rock songs, the algorithm could have produced 50% accuracy by labeling every song as rock. Because of this, accuracy and validity would likely be improved by using a more diverse dataset.

When running a train and test on a dataset, one generally splits the data into a training set and a test set. In our case, our training set contained 70% of the data (886 songs) and our test set contained 30% of the data (380 songs). This small dataset reflects the random distribution of genres in the Million Song Dataset, where roughly 50% of the music was classified as “rock”. Figure 3 was pulled from a paired publication on Topic Modeling of the same dataset to show the variance of genres.

It was not surprising that rock yielded the second-highest percentage in accuracy at 88%. An interesting result was the high accuracy (89%) in predicting Latin genres. Unlike rock, this result was not due to the high number of rock songs in the corpus. However, if we examine the values in the 6th column of the table, we see a lot of false prediction of other genres as Latin. We see more false prediction of this genre than any other. Genres that were commonly wrongly classified as Latin were Pop, Reggae, Rock and RnB. Intuitively, these genres are more related to Latin than others such as Metal, Folk or New Age, leading us to question the elements’ weight in our model. Furthermore, examination of false classifications were always related to the truth-genre. For example, besides a large number of false classifications being Rock, RnB was only falsely classified as Latin and Blues. Country was falsely classified as Rock and Blues. Metal, which is usually classified as a sub-genre of rock, was always misclassified as rock.

We decided to analyze the log-odds coefficients of our model’s decision function to further look into this. The log-odds of success change with a one-unit change of an independent variable. Increasing the log-odds of success increases probability of that prediction. Figure 4 shows these coefficients in a table.

A positive coefficient of a feature means it helps in the predicting of that genre. Using this data, we did not find strong answers to our questions stated above. However, it does give us more insight into the reasons behind the shortcomings of our model and what could be improved upon in the future. Many of the musical elements used in this model had to be treated as continuous variables when they are generally thought of as more categorical (like time signature and mode). In our coefficient analysis, we had to remove categorical data, which often is a better classifier of style and genre. In addition, elements like timbre data and pitch, encoded into the Million Song Dataset as arrays, are more likely to correlate with genre/style but were much more difficult to process using our chosen computational methods. Inclusion of these parameters might improve our model’s accuracy.

Genre Learning: A Machine Learning Approach

Written by Robert Crimi