TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

ResNet: The Most Popular Network in the Computer-Vision Era

4 min readJan 20, 2020

--

It seems challenging to classify images using a computer algorithm. Astonishingly, a recent investigation in the computer-vision area succeeds with a 1.3% top-5 error on the dataset named ImageNet. In 2020, the state-of-the-art in the image classification changed to EfficientNet, which is published by the Google Research Team. However, the network called ResNet performed well in the image classification area for a long period. Moreover, many researchers use ResNet as their network backbones to improve their performance. This article will help you to understand what ResNet is and how it is motivated intuitively.

Link: https://paperswithcode.com/sota/image-classification-on-imagenet

Degradation Problem

Deep Neural Networks suffer from many difficulties in the learning process. Computer Vision researchers address solutions to them, such as solving vanishing/exploding gradient problems with Batch Normalization. (https://arxiv.org/pdf/1502.03167.pdf) The ResNet paper introduces a challenging problem named “Degradation Problem.” Before reading, let’s think about the question below.

More layers, better accuracy?

It seems quite intuitive that adding layers on the network enlarges the output’s diversity. If every added layer is an identity mapping, the new network can output the same value as the original network. Thus, it is persuasive that more layers in a well-trained network, higher classification accuracy. Unfortunately, that is not reality.

When you estimate the accuracy using plain networks(before ResNet), as model complexity increases, its accuracy degrades rapidly. This problem is a Degradation Problem. It is not an overfitting problem; however, the network’s performance dropped as the model complexity increases. The authors claim that plain networks are not suitable for approximating identity mapping; thus, adding layers does not guarantee that the layer-added network can express all the values of the network before the layer addition. The motivation of ResNet is to make an identity-mapping suitable network.

Shortcut-Connection

To make an identity-mapping suitable network, the authors used a method name Shortcut-Connection. The main intuition of this method is rather than learning function F(x), learn function F(x) + x. It is easier to learn an identity mapping; since the layer weights are all tuned to 0, it’ll produce an identity mapping instead of a zero mapping. Moreover, it is differentiable so that end-to-end trainable.

Another consideration of Shortcut-Connection is adding projection in identity. Since the dimension can be different between the Shortcut-connected layer, there are three considerations. A) Zero-padding on increased dimensions, B) Projection shortcuts are used only on the dimension-changed part, C) All Shortcuts are projections. The table below is an estimation of each case. (A, B, and C behind ResNet-34 means A), B), and C) applied in ResNet-34)

Focus on the second row-box

The result reveals that performing projection on the identity does not seriously impact on performance. Changing the number of parameters makes the comparison with plain-networks harder. Thus, the authors simply used identity mapping in the network.

Overall Backbone

To refer to the detailed structure of the network, refer to the paper.

Link: https://arxiv.org/pdf/1512.03385.pdf

Experiments

They compared two networks: the plain network and ResNet. Two networks used the same layers; however, only ResNet has Shortcut-Connections. They’ve experimented on two datasets: ImageNet and CIFAR-10. The graphs below are the results of the experiment.

(The thin curves denote training errors, and the bold curves denote validation errors)

Performance of Plain-Network on ImageNet

As you can see from the graph, the training error increased as the layer number increased. It means that the plain network is suffering from the degradation problems. How about ResNet?

Performance of ResNet on ImageNet

No more degradation problems. As the number of layers increases, their training error decreases.

Result of experiment on ImageNet

The authors added more layers in ResNet to make more complicated models. As expected, increasing the number of layers improved the performance. This tendency was similar when the experiment is done on CIFAR-10.

Result of experiment on CIFAR-10

However, we can observe that using 1202 layers on the network, performance drops significantly. The paper argues that it is due to overfitting. Even though there is a significant performance drop, it still outperforms the original methods.

Conclusion

ResNet was motivated to address the degradation problem. By intuitive approach, they designed the network to be suitable for identity-mapping approximation. The experiment shows that ResNet excellently addressed the degradation problems, however, it works poorly for extremely deep networks.

I appreciate any feedback about my articles, For any discussion, you are welcome to email me. If something is wrong or misunderstood, please tell me. :)

Contact me: jeongyw12382@postech.ac.kr

For further reading

D2-matching Explanation:

https://medium.com/towards-artificial-intelligence/d2-net-matching-problem-among-images-under-an-extreme-appearance-changes-9f059f33a2ef

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.