[Week #5 — Rock or Not? ♫]

☞ This sure does.

Defne Tunçer
bbm406f18
4 min readDec 30, 2018

--

We are Defne Tunçer & Kutay Barçin and this is our fifth article of series of our Machine Learning Course Project about Music Genre Classification.

GitHub

METHODS AND PARAMETER TUNING

Last week we discussed baseline methods Nearest Neighbors, Logistic Regression and Support Vector Machines. This week we are planning to discuss on how to improve our accuracies.

Having a high dimensional data makes it more easily separable linearly and simple classifiers are indeed a good choice to begin with.

Starting with Logistic Regression, our test accuracy obtained by applying Logistic Regression (with default settings one-versus-rest (ovr) scheme, solver=liblinear, C=1 and penalty=l2) using MFCC alone was 62.55% and with all the features it was 65.13%.

One of our following option is multinomial approach which minimizes the loss so that it fits across the entire probability distribution whereas ovr fits a binary problem for each label.

As for the solvers we have a couple of options:

  • A Library for Large Linear Classification (liblinear): Linear combination of the features
  • Newton (newton-cg): Quadratic approximation
  • Stochastic Average Gradient (sag) and (saga): ‘sag’ uses a Stochastic Average Gradient descent while ‘saga’ uses its improved, unbiased version and also supports l1 penalty. This is therefore the solver of choice for sparse multinomial logistic regression. Both methods are often faster than other solvers when data and features are large. Which is just for our case!
  • Limited-memory Broyden–Fletcher–Goldfarb–Shanno Algorithm (lbfgs): For small datasets

Overall of solvers:

Figure 2. Logistic Regression Solvers

While testing parameters ConvergenceWarning keep occuring, which can mean one or more of: data normalization is needed, more iterations are required or data simply can’t be fitted by a logistic model! We’ve reached the test accuracy of 64.87% on the solver newton-cg using l2 penalty.

As we move on to Support Vector Machines, we have fewer options to work with. Our test accuracy obtained by applying Support vector classifier(with default settings one-versus-rest (ovr) scheme, C=1, kernel=linear and penalty=l2) using MFCC alone was 61.77% and with all the features it was 63.35%.

Generally Linear SVC has more flexibility as for the options of penalties and loss functions. It is also expected to scale better to large numbers of samples. As appeared in the Figure 1. rbf kernel captures the non-linearities of the data better. When training an SVM with the Radial Basis Function (RBF) kernel 66.61% test accuracy is reached! Train accuracy is 82.57%, that is we will need to work on overfitting problem.

Parameter tuning is a time consuming work since most parameter combinations take couple of hours to converge. Dimension reduction will be examined next week.

BONUS! :

Stochastic Gradient Descent with ‘log’ loss fits logistic regression model while with ‘hinge’ loss it fits the Linear SVC. SGD is more faster, more flexible and requires less memory but more tuning. Obtained test accuracies are 63.01% and 62.83% respectively.

Ridge Classifier appears to be the fastest method which converges to a solution in seconds with a cross validation of regularization parameter using one-versus-all approach. Test accuracy is 62.47%.

Parameter C (or alpha in the case of Ridge) is the regularization strength which improves the conditioning of the problem and reduces the variance of the estimates. Both RidgeClassifier, LinearSVC and Logistic Regression requires tuning for the parameter C.

Well.. That was all! See you next year!

--

--