Geek Culture
Published in

Geek Culture

Review — Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

Split-Brain Auto for Self-Supervised Learning, Outperforms Jigsaw Puzzles, Context Prediction, ALI/BiGAN, L³-Net, Context Encoders, etc.

Proposed Split-Brain Auto (Bottom) vs Traditional Autoencoder, e.g. Stacked Denoising Autoencoder (Top)
  • A network is split into two sub-networks, each is trained to perform a difficult task — predicting one subset of the data channels from another.
  • By forcing the network to solve cross-channel prediction tasks, feature learning is achieved without using any labels.

Outline

  1. Split-Brain Autoencoder (Split-Brain Auto)
  2. Experimental Results

1. Split-Brain Autoencoders (Split-Brain Auto)

Split-Brain Autoencoders applied to various domains

1.1. Cross-Channel Encoders

  • First, input data X is divided into X1 and X2.
  • Then, X1 goes through network F1 to predict X2:
  • Similar for F2 that X2 goes through network F2 to predict X1.
  • Left: For Lab color space, X1 can be L, which is luminance information, and X2 can be ab, which are color information.
  • Right: For RGB-D image, X1 can be RGB values, and X2 can be D, which is depth information.
  • l2 loss can be used to train the regression loss:
  • It is found that the cross-entropy loss is more effective than l2 loss for the graphics task of automatic colorization than regression:

1.2. Split-Brain Autoencoders as Aggregated Cross-Channel Encoders

  • Multiple cross-channel encoders, F1, F2, on opposite prediction problems, with loss functions L1, L2, respectively:
  • Example split-brain autoencoders in the image and RGB-D domains are shown in the above figure (a) and (b), respectively.
  • The network is modified to be fully convolutionally and trained for a pixel-prediction task.

1.3. Alternative Aggregation Technique

  • One alternative, as a baseline: The same representation F can be trained to perform both mappings simultaneously:
  • Or even considering the full input tensor X.

2. Experimental Results

2.1. ImageNet

Task Generalization on ImageNet Classification
  • The proposed split-brain autoencoder architecture learns the unsupervised representations on large-scale image data from ImageNet.
  • Lab color space is used to train the split-brain autoencoder.
  • All weights are frozen and feature maps spatially resized to be ∼9000 dimensions.
  • All methods use AlexNet variants.
  • The 1.3M ImageNet dataset without labels is used for training, except for ImageNet-labels.
  • To be brief, different autoencoder variants are tried.

2.2. Places

Dataset & Task Generalization on Places Classification
  • A different task (Places) than the pretraining tasks (ImageNet).

2.3. PASCAL VOC

Task and Dataset Generalization on PASCAL VOC
  • To further test generalization, classification, detection and segmentation performance is evaluated on PASCAL VOC.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store