A Paper a Day: #5 Adversarial Adaptation of Synthetic or Stale Data

Today we discuss another paper for domain adaptation by Young-Bum Kim, Karl Stratos, and Dongchan Kim. in Episode #3, we discussed a domain adaptation technique based on data selection. This paper however, uses an adversarial training approach for domain adaptation.

The paper focuses on two different types of domain adaptation problems: transferring from synthetic data to live user data (a deployment shift), and transferring from stale data to current data (a temporal shift). Both cause a distribution mismatch between training and evaluation, leading to a model that overfits the flawed training data and performs poorly on the test data.

This paper uses and builds on several recent advances in neural domain adaptation such as adversarial training and domain separation network, proposing a new effective adversarial training scheme. In both supervised and unsupervised adaptation scenarios.

Proposed Approach

The proposed model is based on an extension to the architecture proposed by Ganin et al. (2016) for adversarial training methods for unsupervised domain adaptation. They partition the model parameters into two parts: one inducing domain specific (or private) features and the other domain invariant (or shared) features. The domain invariant parameters are adversarially trained using a gradient reversal layer to be poor at domain classification; as a consequence, they produce representations that are domain agnostic.

The paper proposes approaches for both supervised and non-supervised adaptation. The architecture makes heave use of BiLSTMs for encoding feature relationships. The architecture has three BiLSTM encoders

  1. Θ-src: induces source-specific features
  2. Θ-tgt: induces target-specific features
  3. Θ-shd: induces domain-invariant features

The main idea is to define suitable loss functions for each of these encoder to learn the source-specific, target-specific, and domain-invariant features. These loss function include:

  1. Source Side Tagging Loss
  2. Reconstruction Loss
  3. Adversarial Domain Classification Loss
  4. Non-Adversarial Domain Classification Loss
  5. Orthogonality Loss

I’ll leave out the details for each of these loss functions, but the name enough should be self explanatory. The model is trained jointly used the above five loss functions, and can be extended to handle the supervised domain adaptation case.

Experiments

Experiments consider two possible domain adaptation (DA) scenarios:

  1. adaptation of an engineered dataset to a live user dataset
  2. adaptation of an old dataset to a new dataset.

In both cases, the approach yields clear improvement over strong baselines.

Like what you read? Give Amr Sharaf a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.