[Week #4— Rock or Not? ♫]

☞ This sure does.

Defne Tunçer
bbm406f18
4 min readDec 23, 2018

--

We are Defne Tunçer & Kutay Barçin and this is our fourth article of series of our Machine Learning Course Project about Music Genre Classification. May the fourth be with you!

GitHub

RELATED WORKS

This week we finally dig into some related works and classification methods! We discussed and applied several models in order to figure out which approaches best suited for our problem.

K-Nearest Neighbor Classifier is a non-parametric method used for classification and regression. Although K-NN is an easy implemented algorithm and performs well in a large number of classification problems, K-NN classification suffers from the curse of dimensionality.

Our model has a dimension space of 518 features which makes K-NN vulnerable. In order to overcome this, we planned to apply Principal Component Analysis (PCA) to our matrix which reduces the input to a lower desired dimension. Below visualization is the scatter plot of two genres Rock and Classical after applying PCA to reduce the feature dimensions to three dimensions.

Logistic regression is a technique from the statistics field and provides a probability score for observations. It is a go-to method for binary classification problems, however for our multiclass problem, we applied Multinomial Logistic Regression through the one-vs-rest (OvR) scheme with a loss function.

Normalized Confusion Matrix

Support Vector Machine is a supervised learning method that can be used for classification. SVM works efficiently with high dimensional features even if the number of dimensions is greater than the number of samples. We preprocessed the data before applying Linear SVC using standard scaler to represent standard normally distributed data. Apart from Logistic Regression, SVM uses one-against-one approach for multi-class classification. This method is consistent, which is not true for one-vs-rest classification.

Since we have an unbalanced data, we were expecting our non-normalized confusion matrix to be appeared as this. We considered that having an unbalanced training and test data as reflecting the population, we would also be able to model the probability of appearing in real life situations.

As we observe our normalized confusion matrix, we figure out that as we are trying to preserve the percentage of the population we also accidentally miss-predicted most of the minority genres. As a future work we consider building genre-specific models for minority genres.

Furthermore, we are also planning to expand our implementation with Neural Networks and improve our current models.

Linear SVM, Logistic regression and KNN appear to have difficulties in capturing the non-linearities of the data, thus they achieve less accuracy than expected.

As for the related works, we view various research papers and real-word applications. Most of the studies done by exploring the timbre texture, the rhythmic content, the pitch content, or their combinations.

Among these, we have a curious report on the properties of the auditory human perception system proposed as a music genre
classifier.

DeepSound

DeepSound has done a music genre recognition system using visual representations of frequency distribution over time a Convolutional Neural Network. Image processing on audio! We may also be interested in that topic in the future!

--

--