[Week #7 — Rock or Not? ♫]

☞ This sure does.

Defne Tunçer
bbm406f18
4 min readJan 13, 2019

--

We are Defne Tunçer & Kutay Barçin and this is our sixth article of series of our Machine Learning Course Project about Music Genre Classification.

GitHub

Can’t wait! Here’s our video as we promised last week.

Neural Network

Neural Network a.k.a. Multi-Layer Perceptron can learn a non-linear function approximator for either classification or regression. We have applied Cross-Entropy (Log-Loss) as loss function and Softmax as activation function for the output layer with batch size of 200. For hidden layers tanh, relu and sigmoid(logistic) are tested. We have options for solver as well: lbfgs, sgd and adam.

We already met with lbfgs and sgd. Adam, on the otherhand, is especially designed to work well on large datasets. From our observations, it converges really fast compared to other solvers and methods. Tested on all features, with logistic function as activation, and 2 hidden layers, after 10 epoch adam converges to accuracy score of 66.16% while sgd converges to accuracy score of 67.23% after 500 epochs.

Figure 1. adam(left) , Figure 2. sgd(right)

For computational efficiency we are conducting our tests with adam solver.

Tested on MFCC with no hidden layer, we have obtained accuracy score of 63.16%. As we add one hidden layer of 100 neurons, we were able to increase our accuracy quickly to 65.13%.

Tested on all features as input and with no hidden layer, we have immediately reach the accuracy score of 65.0%. It appears, while having large feature set increases accuracy, it also overfits the training data as expected. To avoid overfitting, alpha parameter, which is L2 penalty (regularization term) parameter is tuned with k-fold cross validation. Tested on one hidden layer of size 250 neurons, we have obtained accuracy score of 66.14%. Tested on one hidden layer of size 250 and a fully-connected layer we reached our highest (Figure 2.) 67.23%.

As this is our last blog article, we are getting ready to say goodbye to our project.

What we have done so far?

As for the beginning, we were interested in predicting the genres of the music tracks.

In literature it appears in different names: Music Genre Recognition (Classification) and Audio Tagging.

It benefits: Content based music recommendation, automatic audio tagging and automatically generated playlists.

Task can be done with different approaches: from audio or from audio metadata. We decided to use only audio as our input.

Then we choose our dataset to be a subset of the most recent published dataset: FMA, which consists of 15 genres of Rock, Electronic, Hip-Hop, Folk, Pop, Instrumental, International, Jazz, Classical, Old-Time / Historic, Country, Spoken, Soul-RnB, Blues and Easy-Listening.

Using features that are extracted from our dataset with librosa: Each track contains 518 attributes categorized in 11 audio features; Mel Frequency Cepstral Coefficients (mfcc), Chroma Features (chroma cens, chroma cqt, chroma stft), Spectral Features (spectral bandwidth, spectral centroid, spectral contrast, spectral rolloff), RMS Energy (rmse), Tonal Centroids (tonnetz), Zero Crossing Rate(zcr). Each of these features are stored as statics, including kurtosis, max, mean, median, min, skew and std.

We, then, implemented baseline methods for classification: Nearest Neighbors, Logistic Regression and Support Vector Machines. Additionally Ridge Classifier and Stochastic Gradient Descent Classifier are discussed. After applying feature and model selection we have reached our final scores.

Our final approach was neural network and with this last blog article we-are-done! You can find our course page on:

Our special thanks to Assoc. Prof. Aykut Erdem and Necva Bölücü. And thank you for staying with us till the very end!

--

--