Tackling class imbalance with SVM-SMOTE

This is part of an experiment on studying applicability of neural network.

The trade-off

Suppose we help someone grade wines so as to price them. Good wines are always hard to find, and are the minority. Imbalaced distribution of wine qualities makes grading, i.e. classification, difficult. False positives present in grading make our clients lose reputation, while false negatives make them lose money. It’s always the trade-off faced when we boost the classification to either have less false positives or false negatives.

SVM-SMOTE

In this experiment, SVM-SMOTE, an over-sampling technique, is used to investigate how well it handles the trade-off. SMOTE, its ancestor, is a popular over-sampling technique which balances class distribution by synthetically generating new minority class instances along directions from existing minority class instances towards their nearest neighbours. SVM-SMOTE focuses on generating new minority class instances near borderlines with SVM so as to help establish boundary between classes.

Experiment results

We balance class distribution with SVM-SMOTE.

smote = SMOTE( kind='svm' )
X_smote, Y_smote = smote.fit_transform( X_data, Y_data )
Class distribution before (left) and after (right) applying SVM-SMOTE
Two dimensional illustration of class distribution before (left) and after (right) applying SVM-SMOTE, where blue instances are the majority class and red ones are the minority class

A neural network with two hidden layers and a dropout layer is trained with categorical cross entropy as objective and adam as optimizer.

m = Sequential()
m.add( Dense( 512, input_dim=X_smote.shape[ 1 ], init='glorot_normal', activation='relu' ) )
m.add( Dense( 512, init='glorot_normal', activation='relu' ) )
m.add( Dropout( .5 ) )
m.add( Dense( 2, init='glorot_normal', activation='softmax' ) )
m.compile( loss='categorical_crossentropy', optimizer='adam' )

The network classifies 96.68% of majority and 90.40% of minority correctly.

                      predicted
majority | minority
====================================
actual majority | 96.68% | 3.32%
actual minority | 9.60% | 90.40%

On the contrary, without SVM-SMOTE, a network with same configuration classifies 100% of majority and 0% of minority correctly, due to the fact that it achieves high score simply by classifying all instances as majority.

                     predicted
majority | minority
====================================
actual majority | 100.00% | 0.00%
actual minority | 100.00% | 0.00%

References

Nguyen, Hien M., Eric W. Cooper, and Katsuari Kamei. “Borderline over-sampling for imbalanced data classification.” International Journal of Knowledge Engineering and Soft Data Paradigms 3.1 (2011): 4–21.

Chawla, Nitesh V., et al. “SMOTE: synthetic minority over-sampling technique.”Journal of artificial intelligence research (2002): 321–357.

Cortez, Paulo, et al. “Modeling wine preferences by data mining from physicochemical properties.” Decision Support Systems 47.4 (2009): 547–553.

UnbalancedDataset. https://github.com/fmfn/UnbalancedDataset