How we solved multilabel classification and segmentation with very little data - PART 1

Vishal Rajput
Jun 20 · 4 min read

Multi-label classification is considered to be one of the toughest problems in ML and AI. Firstly, we need to understand how it is different from multi-class problems. For a multi-class problem, we need to predict one class per image but multi-label has a different number of classes for each input image. You can’t even apply AU-ROC to identify the best threshold for the classifier because an image has multiple objects and each object may have a different threshold. Basically, it’s a two-fold problem, firstly, identify how many classes are present for a given image and then what are those classes.

Let’s dive deep straight into the problem, we had to develop a model that can generalize well over the given data. We used only 750 images from PASCAL-VOC to identify 20 different classes in an image. Given the small size of the dataset, it was extremely tough to train a good model that can generalize well on the test set. After a lot of trials, we settled with pre-trained EfficientNetB6 (trained on Imagenet) to extract deep features for our training data. Generally, the small-size dataset doesn’t work properly even with the transfer learning if the test set distribution is different from the training set. Whenever you have a small dataset, always go with the DL model to extract features and SVM to make classification on the extracted features.

For our purpose, we used efficeintNetB6 to extract the features, we added a global average pooling layer at the last to extract the deep features. Please keep in mind that we are not doing any kind of transfer learning here. It’s just using the Pretrained model to get deep features for our dataset.

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image:
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.keras.backend as K
!pip install tensorflow_addons
import tensorflow_addons as tfa
from matplotlib import pyplot as plt
import cv2
import random
from sklearn.svm import LinearSVC
import sklearn
from sklearn.multioutput import MultiOutputClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import multilabel_confusion_matrix, ConfusionMatrixDisplay
from sklearn.svm import SVC
backbone_model = tf.keras.applications.EfficientNetB6 (include_top=False, weights='imagenet', input_shape =

backbone_model.trainable = False

x = tf.keras.layers.GlobalAveragePooling2D(name='g_avg_pool')(backbone_model.output)
model_b6 = tf.keras.Model(backbone_model.input,x)

You can try different image augmentation techniques but since we are not training anything here so we don’t require it but we will be needing image augmentation for the segmentation task which we will discuss in the next blog of this series.

Now we need to feed our results to SVM, but SVM can’t do multilabel classification so we are going to use a multi-output classifier with RBF SVM as its base classifier. Don’t use the default SVM parameter, hyper-tune the parameters nad you will see a huge jump in the performance of the model.

extracted_features = feature_extraction(model_b6, input_image)svm_model = SVC(random_state=42, probability = True, C=1, gamma=0.002)

# Make it an Multilabel classifier
mlc = MultiOutputClassifier(svm_model, n_jobs=-1)
mlc_op =, labels)

For our test set, we were getting around 0.85% correct result, for the multilabel setting we can’t use accuracy as a measure. Always use Dice loss, top-k precision, or meanIOU to calculate the performance of your model. Understand one thing, you need to calibrate your model very precisely to get above 80% results. So, use the predict_proba() method in SVM and then do thresholding to convert your probability score into the class label. When we use classifer.predict() it automatically sets our threshold at 0.5 which might not be the most optimum for your case.

def dice_coef(y_true, y_pred, smooth=1):
y_true = tf.cast(y_true, tf.float32)
y_pred = tf.cast(y_pred, tf.float32)
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection) / (K.sum(y_true_f) + K.sum(y_pred_f))
def dice_coef_loss(y_true, y_pred):
return 1 - dice_coef(y_true, y_pred)

This marks the end of our classifier, note that using such a small dataset leads to an extremely sensitive model, always plot the confusion matrix for each individual class and then change the distribution of each class by adding or removing the images from a particular task.

Here’s the full code:

Part 2:

Actual code is much more complex than what has been explained here, there are a ton of other steps which include combining the deep features from EfficientNetB5, B6, and B7. We have written custom functions to add or remove images from a particular class. Other tasks include plotting the confusion matrix for each class individually. We’ll discuss one trick also to increase the accuracy in part 2 of this blog.


Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…