Review: RiR — ResNet in ResNet (Image Classification)

In this story, RiR (ResNet in ResNet) is briefly reviewed. RiR attempts to generalize the ResNet block for Residual Network by making the input with residual stream and transient stream, so as to improve the accuracy. And it is published in 2016 arXiv with about 80 citations. I treat RiR as a kind of sidetrack papers for ResNet. (SH Tsang @ Medium)

RiR is designed in the way that the input signal can go through the network in between ResNet and standard ConvNet.


What Are Covered

  1. ResNet in ResNet
  2. Results
  3. Further Analyses

1. ResNet in ResNet

A generalized residual architecture is introduced to combines residual networks and standard convolutional networks in parallel residual and non-residual streams.

(a) 2-layer ResNet block. (b) 2 generalized residual blocks (ResNet Init). © 2-layer ResNet block from 2 generalized residual blocks (grayed out connections are 0). (d) 2-layer RiR block

(a) Is is the conventional ResNet block which consists of the convolution path and skip connection path.

(b) It is one generalized residual block (ResNet Init) which formulated as below:

ResNet Init Equations for Residual Stream and Transient Stream

Thus, it can combine residual networks and standard convolutional networks in parallel.

(c) When the grayed out connections in the ResNet Init tend to 0, then it becomes a 2-layer ResNet block. Therefore, RiR adds flexibility to the network in between ResNet and standard ConvNet.

(d) By cascading 2 ResNet Init, a 2-layer RiR block is obtained.

Below table summarized the conditions when residual and transient connections are on/off:

The 18-layer + wide RiR used for experiments:

18-layer + wide RiR

2. Results

2.1. CIFAR-10

CIFAR-10
  • 18-layer + wide RiR obtains better results than the 110-layer ResNet.
  • It obtains competitive results of 94.99% when comparing with fractional max-pooling.

2.2. CIFAR-100

CIFAR-100
  • 18-layer + wide RiR obtains the best accuracy of 77.10%.

3. Further Analyses

3.1. Effect of Zeroing Learned Connections

Accuracy Change When Zeroing Learned Connections of One Layer in Each Stream
  • There are consistent drops along the layers when zeroing one of the stream at different layers. This show that both residual and transient streams at each layer are also important for contributing the accuracy.

3.2. Adding More Layers Within a Residual Block

Accuracy When Adding More Layers Within a Residual Block
Accuracy of RiR and ResNet with different numbers of layers per block on CIFAR-10
  • There is a large accuracy drop for original ResNet when it increases the number of layers from 8 to 10.
  • RiR keeps constant for different number of layers for each residual block.

3.3. Study on Number of Blocks and Layers Per Block

Number of Blocks and Layers Per Block
  • 9 blocks with 3 layers per block perform best for RiR

Besides RiR, there is also another sidetrack paper, ResNet of ResNet (RoR), firstly appeared in 2016 arXiv, and recently published in 2018 TCSVT. Hope I can cover it in the future.