Removing Undesirable Feature Contributions Using Out-of-Distribution Data

Published in

SNU AIIS Blog

7 min readMar 26, 2022

By Seyeon An

The reverse image search function in Google Images shows that neural networks are capable of recognizing and classifying images. Yet, were they always accurate? Not really. You must have experienced something similar to this: entering a picture of a dog in Google Image, and in return, receiving hundreds of panda pictures. Why does this happen?

Deep neural networks (DNNs) have developed to the point in which it can perform outstandingly on many tasks as computer vision and natural language processing. Development is ongoing, and now we are mostly trying to train these DNNs to improve their accuracy. The main problem regarding accuracy is that neural networks are highly vulnerable to adversarial perturbations.

For instance, take a look at the image below. The two images, before and after adding the noise seem identical to us. But for the neural network, the image on the right seems like a completely different object — as a panda. Here, the noise added to the image would be an adversarial perturbation, and the training method which attempts to solve this problem, via making these neural networks less vulnerable to perturbation, would be referred to as adversarial training.

The image recognition process of the neural network is extremely vulnerable to perturbation.

The deep-seated task of adversarial training — lack of training data — has been partially resolved thanks to the recently proposed data augmentation method using unlabeled-in-distribution (UID) data. Yet there exist some drawbacks: the lack of availability and the dependence on the accuracy of the pseudo-label generator.

To compensate with these drawbacks and to improve generalization in both adversarial and standard learning, we propose a data augmentation method using out-of-distribution (OOD) data: out-of-distribution data augmented training (OAT).

What is Adversarial Training?

To understand why out-of-distribution data augmented training is needed for better accuracy and efficiency of DNNs, we have to understand what adversarial training is and why it is important.

Adversarial training refers to the training process that includes adversarially attacked images as its training data set. The goal of adversarial training is to make the DNNs more robust — to make machine learning models less vulnerable to perturbations.

The Semi-Supervised Learning Method

In adversarial training, we need much more data sets than standard training. Since it is insufficient to use only labeled data, we use a mixture of labeled and unlabeled data: which refers to the semi-supervised learning method.

Supervised Learning : Uses only labeled data as its data set
Semi-Supervised Learning : Uses some labeled data and lots of unlabeled data as its data set
Unsupervised Learning : Uses only unlabeled data as its data set

We use semi-supervised learning method as our adversarial training methodology.

Robust and Non-Robust Features

As the main task of artificial intelligence is to simulate human intelligence, the image recognition process should also simulate that of humans. Here, it is integral to distinguish in between robust features and non-robust features, which are the two types of useful features in an image.

Robust Features : Features humans can perceive; Strongly correlates with the image label.
Non-Robust Features : Features humans cannot perceive; Has weak correlation with the image label.

It has been demonstrated that there is a trade-off relationship between adversarial robustness and standard accuracy. Adversarial training attempts to solve this problem by making non-robust features not used in classification of images.

Out-Of-Distribution Data

The algorithm of the classifier should have the ability to recognize perturbations (a.k.a unusual examples). It is because (1) there is a high possibility of wrongly classifying these perturbed examples and (2) misclassifications are made with high confidence level.

Out-of-distribution (OOD) data are very close to normal data — most of them looking exactly identical in the human eye. These OOD data may be close to normal data (as blurry, adversarially attacked inputs) or even belong to a new class that the DNNs have not yet been trained to classify.

Why is this essential? For example, trained DNNs are often used in bacteria identification based on genomic sequences — which is further used for diagnosis and treatment of fatal diseases. New classes of bacteria have been discovered throughout the past decades, and we want to use our DNNs to classify these bacteria. However, even a classifier with high performance might wrongly classify a certain disease to another, because it is OOD data — from a completely new class that the classifier has not been trained to classify.

Unlike images of dogs or pandas, which obviously doesn’t cause great problems even though they’re wrongly classified, genomes and bacteria being wrongly classified would cause great problems. Such real-life applications show the importance of OOD detection: as recent studies show, OOD detection is the first step to build classifiers that “fail gracefully.”

Out-Of-Distribution Data Augmented Training

We propose Out-of-distribution Augmented Training (OAT), which is the training on the union of the target dataset D_t and the OOD dataset D_o.

Setup of OOD Dataset

Out-of-distribution Augmented Training

Our OAT algorithm is a data-augmentation based robust training algorithm to train a loss which is carefully designed to benefit from the additional OOD data. The OOD data is fed with random labels to the training procedure.

An uniform distribution label is assigned to all the OOD data samples. Through the process, we can leverage OOD data for supervised learning at no additional cost. The OOD data is much less restrictive than the Unlabeled In-Distribution (UID) data.
While OOD data can be utilized to improve the standard and robust generalization of neural networks, our goal is to enhance the classification accuracy of neural networks. Moreover, adversarial training on the pairs of OOD data samples and the uniform distribution label affects the weight for robust features as well as non-robust features. Thus the balance between losses from the target dataset D_t and the OOD dataset D_o is essential in OAT. Thus we introduce the following algorithm, with hyperparameter α ∈ R^+, and train them as below:

Out-of-distribution Augmented Adversarial Training (OAT-A)

Out-of-distribution Augmented Standard Training (OAT-S)

(x_t,y) : image-label pair
L : Loss Function
S : a set of adversarial perturbations
t_unif : uniform distribution label
θ : network parameter

Experiments

Our theoretical results show that this way of feeding OOD data helps to remove the dependency to non-robust features and hence improves robustness.

We created OOD datasets from the 80 Million Tiny Images dataset (80M-TI), and resized ImageNet to dimensions of 64x64 and 160x160, dividing them to datasets containing 10 and 990 classes, respectively ImgNet 10 and ImgNet990.

We resized Places365 and VisDA-17 for the experiments on ImgNet10 and cropped the Simpson Characters (Simpson), and Fashion Product (Fashion) datasets to dimensions of 32 × 32 for the experiments on CIFAR10 and CIFAR100.

Results below show that OAT improves the robust generalization of all adversarial training methods tested regardless of the target dataset.

Accuracy (%) comparison of the OAT model with Standard, PGD, and TRADES on CI- FAR10, CIFAR100, and ImgNet10 (64×64) under different link to the threat models.

Standard : The model which is normally trained on the target dataset.
PGD : The model trained using PGD-based adversarial training on the target dataset.
TRADES : The model trained using TRADES on the target dataset.
OAT_PGD : The model which is adversarially trained with OAT based on a PGD approach.
OAT_TRADES : The model which is adversarially trained with OAT based on TRADES.
OAT_ D_o : The model which is normally trained with OAT using the OOD dataset D_o .

Results below show that OAT still improves robust generalization even when many pseudo-labeled data are used.

Comparison of OAT using various OOD datasets for improving robust generalization on CIFAR10 and ImgNet10 (64 × 64).

*Results that show the effectiveness of OAT in standard learning can be found in the paper.

Future Directions

It has been proven by experiments — performed on various OOD datasets — that even OOD data having little correlation with the target dataset from the human perspective can benefit robust and standard generalization through the proposed method. It can thus be implied that between diverse datasets, there exists a common undesirable feature space. We can also conclude that when extra UID data are available, OAT can improve the generalization performance even when substantial pseudo-labeled data are used.

It is a meaningful discovery that training with OOD data can remove undesirable feature contributions. Assuming from experimental results, it seems difficult to implement strong adversarial attacks during adversarial training — which could be one point to work on. Moreover, if we quantify the degree to which undesirable features are shared between the target and OOD datasets with strong adversarial attacks using OOD data constructed, we could get a step closer to building a neural network that recognizes a dog as a dog even after heavy perturbation has been added.

Acknowledgements

We thank Saehyung Lee and the co-authors of the paper “Removing Undesirable Feature Contributions Using Out-Of-Distribution Data” for their contributions and discussions in preparing this blog. The views and opinions expressed in this blog are solely of the authors.

This post is based on the following paper:

Removing Undesirable Feature Contributions Using Out-Of-Distribution Data, Saehyung Lee, Changhwa Park, Hyungyu Lee, Jihun Yi, Jonghyun Lee, Sungroh Yoon∗, International Conference on Learning Representations (ICLR) 2021, arXiv, GitHub.

Originally posted on our Notion blog, at Mar 15, 2021.