Deep One-Class Classification

A viable solution to the high demand for one-class classification for many applications

Published in

Analytics Vidhya

7 min readDec 8, 2020

In this story, Learning Deep Features for One-Class Classification, by Johns Hopkins University, is presented. This is published as journal Articles of IEEE Transactions on Image Processing (IEEE TIP). In this paper, Application of a novel deep-learning-based approach for one-class transfer learning in which labelled data from an unrelated task is used for feature learning in one-class classification. Experiments on various data sets have shown that the proposed deep one-class classification (DOC) method achieves significant improvements over state-of-the-art classification methods.

They even made the code available for everyone on their Github!
Let’s see how they achieved that.

Outline

What's the One-class classification
Deep one-class classification (DOC) algorithm
Experimental results

1. What’s the One-class classification

One-class classification trains the classifier to be able to identify out-of-class objects when given a single class sample [1]. One-class classification is encountered in many real-world computer vision applications[2, 3, 4] including novelty detection, anomaly detection, medical imaging and mobile active authentication.

2. Deep one-class classification (DOC) algorithm

What is this paper? What did the researchers do exactly?

Despite the promise of one-class classification to many applications, existing methods of one-class classification schemes trained on a given concept alone have failed to produce promising results on real datasets. The authors argue that we can learn to express ourselves more effectively if we take compactness into account along with descriptiveness.

In this study, optimizing the one-class classification problem in terms of transfer learning. The authors address this particular problem by designing deep features that optimize a one-class classification task, which they name one-class transfer learning.

Paper’s introduction about DOC algorithm

The proposed method operates on top of a convolutional neural network (CNN) of choice and produces descriptive features while maintaining a low intra-class variance in the feature space for the given class. For this purpose two loss functions, compactness loss (lC) and descriptiveness loss (lD), are proposed along with a parallel CNN architecture.

The proposed method (Fig. 2) freezes the initial features g_s from the pre-trained deep model and learns gl and hc. gl refers to the feature extraction network and h_c refers to the classification sub-network.

Fig. 2 Method for one-class feature learning

Based on the output of the classification sub network (hc ), it is optimized for two losses: compactness loss (lC) and descriptive loss (lD). Therefore, the overall loss function introduces the compactness loss (lC) as a regularization term while minimizing the descriptive loss (lD).

The proposed method learns the weights of g_l and h_c through backpropagation from the overall loss function.

・Compactness loss (lC) increases the similarity of different image features within the same class, using a given one-class dataset. The loss function is represented by the average similarity between the component samples of a particular batch.

・Descriptive Loss (lD) is a multiclass classification loss function that uses an external multiclass reference dataset to largely separate the different feature representations of each class. The loss function is represented by the cross entropy for the ImageNet dataset.

Fig. 4 Proposed architecture: (a) Training, and (b) testing frameworks of the proposed DOC method.

For training the DOC algorithm, an architecture shown in Fig. 4(a) is used to train the DOC algorithm, and Fig. 4(b) is used for testing. Let the common feature extraction sub-architecture be g(⋅) and the common classification sub-architecture be hc(⋅). For reference and secondary networks, AlexNet and VGG16 are tested.

The training network model is processed by two CNNs, a reference network and a secondary network, with target and reference data input. Weights of the reference and secondary network are tied across each corresponding counterparts. Learning based on two losses, descriptive loss and compactness loss, using features extracted in the reference and secondary network.

The test model includes two phases: template generation and matching. The excitation maps learned during training are used as features to generate templates from a small set of samples. Then, based on the saved templates, classify them with one class SVM, SVDD, k-Nearest Neighbor Classifier, etc.

How have they done that?

The authors have three strategies for using Deep Learning for one class classification to compare the performance of DOC. The dataset used is the normal and abnormal images of the chair. In conclusion, none of the three strategies can produce a compact and descriptive feature.

Extracting deep features
Fine-tune a two-class classifier using an external dataset
Fine-tune using a single class data

Fig. 5 Possible strategies for one-class classification in abnormal image detection. (a) Normal and abnormal image samples. (b) Function space acquired using the AlexNet feature. (c) Feature space obtained by training a two-class CNN using alien objects represented by ImageNet data samples. (d) Feature space obtained by fine-tuning using only normal objects. (e) The feature space obtained using the proposed method.

Fig. 5(b) uses pre-trained training data to extract deep features from existing CNN architectures by Strategy 1. In Strategy 1, since deep features are descriptive, the same classes of samples are expected to be clustered together in the extracted feature space. the feature space in Fig. 5(c) is classified by strategy 2 as two classes of images, one from a normal chair and the other from the ImageNet dataset. Fig. 5(d) is fine-tuned by strategy 3 on a pre-trained AlexNet network using only the regular chair class. Fig. 5(e) is the feature space obtained using the DOC method.

In Fig. 5(b) and 5(c), the samples of normal and abnormal images are not sufficiently separated. In Fig. 5(b), the existing CNN architecture does not have the ability to sufficiently separate the classes of normal and abnormal chairs as different clusters in the existing CNN architecture. In Fig. 5(c), there are subtle differences between the images of the normal and abnormal chairs, but they are more similar than the other ImageNet object images. This prevents the normal and abnormal images from being separated sufficiently as different clusters. In Fig. 5(d), because all class labels are identical, there is no discriminating ability when the abnormal chair object is displayed, and both the normal and abnormal samples are projected to the same point.

Fig. 5(e) shows the relative separation of normal and abnormal samples. The proposed method showed that it is possible to learn an effective representation of the approach of Strategy 1 by adding a loss function to represent descriptiveness and compactness.

3. Experimental Results

In anomaly image detection using The 1001 Abnormal Objects Dataset, the nature of the anomaly is unknown a priori and therefore the training is performed in a single class. With the introduction of the proposed framework, the performance of AlexNet was improved by about 14% and the proposed method performed the best on this dataset.

Next, in one class novelty detection, the Caltech256 dataset is used to evaluate the novelty of a new sample based on the previously observed sample. It is clear that compared to the existing methods, the DOC method yielded a significant improvement. The DOC method improved the performance of AlexNet by about 13%.

The nature of misclassification in the 1-class novelty detection problem is very similar to that of multiclass CNN-based classification. The majority of false-negative cases show that the American flag is in the background of the image or is too close to the image to be able to clearly identify its features. False-positive images are often of the colour of the American flag or a flag-waving texture.

Fig. 6 Sample false detections for the one-class problem of novelty detection

Reference

[1] H. He and Y. Ma. ``Imbalanced Learning: Foundations, Algorithms, and Applications,’’ Wiley-IEEE Press, 1st edition, 2013.

[2] M. Markou and S. Singh. ``Novelty detection: a review — part 1: statistical approaches,’’ Signal Processing, 83(12):2481–2497, 2003.

[3] V. M. Patel, R. Chellappa, D. Chandra, and B. Barbello. ``Continuous user authentication on mobile devices: Recent progress and remaining challenges,’’ IEEE Signal Processing Magazine, 33(4):49–61, July 2016.

[4] P. Perera and V. M. Patel. ``Efficient and low latency detection of intruders in mobile active authentication,’’ IEEE Transactions on Information Forensics and Security, 13(6):1392–1405, 2018.

[ArXiv][Github] Learning Deep Features for One-Class Classification

Past Paper Summary List

Data Uncertainty Learning

2020: [DUL]

One-Class Classification

2019: [DOC]

2020: [DROC]

Biomedical Image Segmentation

2018: [UOLO]

Image Clustering

2020: [DTC]