The Revolution of Depth
Deep Residual Networks (ResNet) were recently proposed by Microsoft team with an immediate impact in machine learning community. They were the first to achieve super-human performance (which lies somehow around 5%) on ImageNet competition with an error rate of only 3.7% — code is available on github.
ResNet are based on a simple idea: feed the output of two successive convolutional layers and bypass the input to the next layers by adding the input x_l of layer l to the output h_ln of layer l+N. With this simple trick the team were able to train networks of up to 1000 layers deep with remarkable results - thus getting rid of the gradient vanish curse once and for all.
ResNet uses a 7x7 conv layer at input level followed with a pool of two layers — in contrast with more complex formats used by Google team with Inception V3 and V4. In ResNet the input of the layers is fed to many modules in parallel and the output of each modules is serially connected. ResNet can be thought as an ensembles of parallel and serial machines operating at smaller depth modules .
This very simple idea proved to be extremely powerful, an in 6 months their publicatio already has more than 200 references.
Recently they presented a video with a demo on the network classifying objects in a the streets (watch the youtube video) with mind blowing accuracy.