Review: DMRNet / DFN-MR — Merge-and-Run Mappings (Image Classification)

Outperforms or Comparable With ResNet, Pre-Activation ResNet, Stochastic Depth, WRN, FractalNet, DenseNet, ResNeXt, PyramidNet, IGCNet / IGCV1

6 min readMay 17, 2019

In this story, DMRNet / DFN-MR, by Zhejiang University, University of Science and Technology of China, Chinese Academy of Sciences, UC San Diego, Microsoft Research, is briefly reviewed. By using Merge-and-Run Mappings, Merge-and-Run Block is introduced which shares an average path for two group convolutions. It is firstly published in 2016 arXiv called “On the Connection of Deep Fusion to Ensembling” for DFN-MR. Then it is published in 2018 IJCAI as “Deep Convolutional Neural Networks with Merge-and-Run Mappings” for DMRNet with 10 citations. It can be seen in the link by Microsoft that the website is mentioning DFN-MR but linked to DMRNet. They both introduce Merge-and-Run idea. Thus, in this story, I would like to review mainly the DMRNet one. (Sik-Ho Tsang @ Medium)

Outline

From Residual Block to Merge-and-Run Block
Analyses of Merge-and-Run Block
Experimental Results

1. From Residual Block to Merge-and-Run Block

**(a) Residual Block, (b) Vanilla-Assembly Block, (c) Merge-and-Run Block (Dotted Circle: Average Operation, Solid Circle: Sum Operation)**

**The network formed by (a) Residual Block, (b) Vanilla-Assembly Block, © Merge-and-Run Block (Dotted Circle: Average Operation, Solid Circle: Sum Operation)**

1.1. Residual Block (ResNet)

The above equation is the very familiar equation of a Residual Block from ResNet. With xt as input from t-th residual block, H(t) is the output of convolutional path. By adding both, we got xt+1.

1.2. Vanilla-Assembly Block (DVANet)

Before talking about Merge-and-Run block, there is also the Vanilla-Assembly Block.
It is a ResNeXt-like block but with only 2 convolutional paths. (If interest, please read my review about ResNeXt.)

1.3. Merge-and-Run Block (DMRNet)

A Merge-and-Run Block is formed by assembling two residual branches in parallel with a merge-and-run mapping:
Merge: Average the inputs of two residual branches.
Run: And add the average to the output of each residual branch as the input of the subsequent residual branch.

2. Analyses of Merge-and-Run Block

2.1. Information Flow Improvement

The above equation can be written in matrix form:

With

It can be:

This shows that during the forward flow there are quick paths directly sending the input and the intermediate outputs to the later block.
A similar conclusion can be drawn for gradient back-propagation.
Thus, merge-and-run mappings can improve both forward and backward information flow.

2.2. Shorter Paths

**Comparing the distributions of the path lengths for three networks.**

All the three networks are mixtures of paths, where a path is defined as a sequence of connected residual branches, identity mappings, and possibly other layers (e.g., the first convolution layer, the FC layer) from the input to the output.
The proposed network are distributed in the range of lower lengths, potentially performs better.

2.3. DVANet and DMRNet are Wider

For Vanilla-Assembly Block (DVANet) in matrix form:

There are two parallel residual branches.
Hence, Merge-and-Run Block (DMRNet) is also wider.
But in DMRNet, two residual branches are not independent as there is a merge-and-run mapping.

3. Experimental Results

3.1. Merge-and-Run Mapping

**Comparison between merge-and-run mappings and identity mappings.**

With Merge-and-Run mapping, it consistently performs better than the networks without Merge-and-Run mapping.

3.2. Comparison with Wide ResNet

**Average Classification Error from 5 Runs on CIFAR-10, CIFAR-100, SVHN**

DMRNet performs the best on CIFAR-10.
And the superiority of DVANets over ResNets stems from the less long paths and greater width.
On CIFAR-100 and SVHN, when the network is deep enough, DMRNet performs the best.
But when the network is not deep enough, ResNet and Wide-ResNet are better. Authors believe the paths in the DVANet and DMRNet are not very long and too many short paths lower down the performance for networks in such a shallow network.

3.3. Combination with ResNeXt

**Average Classification Error from 5 Runs on CIFAR-10, CIFAR-100**

In ResNeXt, it can support K>2 convolutional paths.
By using Merge-and-Run Mapping on ResNeXt, it becomes DMRNeXt and outperforms ResNeXt which shows the efficiency of Merge-and-Run Mapping.

3.4. Combination with Xception

Here, DMRNets contain two Xception blocks.
Again, it outperforms Xception which shows the efficiency of Merge-and-Run Mapping.

3.5. Comparison with State-of-the-art Approaches

**Classification error comparison with state-of-the-arts**

A detailed state-of-the-art approaches comparisons are shown as above.
When DMRNet combines with ResNeXt to form DMRNeXt, it outperforms or obtains comparable performance with ResNet, Pre-Activation ResNet, Stochastic Depth, WRN, FractalNet, DenseNet, ResNeXt, PyramidNet, IGCNet / IGCV1.