Week 4: Waste Classification

Published in

bbm406f19

5 min readDec 23, 2019

Hello everyone, we are Hasan Akalp, Umut Piri and Dilara Iseri. We informed you about transfer learning last week. This week we will talk about pre-trained models that we can use. Good reading to everyone.

ResNet

Residual Neural Network (ResNet) by Kaiming He et al introduced an architecture with “skip connections” and features heavy batch normalization. Such skip connections are also known as gated units or gated recurrent units and have a strong similarity to recent successful elements applied in RNNs. Thanks to this technique they were able to train a Neural Network with 152 layers while still having lower complexity than VGGNet. It achieves a top-5 error rate of 3.57% which beats human-level performance on the ImageNet dataset.

ResNets solve the well-known vanishing gradient problem. This is because when the network is too deep, the gradients from where the loss function is calculated easily shrink to zero after several applications of the chain rule. This results in the weights never updating its values and therefore, no learning is being performed.

With ResNets, the gradients can flow directly through the skip connections backward from later layers to initial filters.

Taken from https://pytorch.org/hub/pytorch_vision_resnet/

VGG-16

VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the models submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another.

VGGNet consists of 16 convolutional layers and is very appealing because of its very uniform architecture. Similar to AlexNet, only 3x3 convolutions, but lots of filters. It is currently the most preferred choice in the community for extracting features from images. The weight configuration of the VGGNet is publicly available and has been used in many other applications and challenges as a baseline feature extractor. However, VGGNet consists of 138 million parameters, which can be a bit challenging to handle.

DenseNet

DenseNet is a network architecture where each layer is directly connected to every other layer in a forward propagation within each dense block. For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers. DenseNet achieves similar accuracy as ResNet but using less than half the amount of parameters. The biggest advantage of DenseNets is the enhanced information and gradient flows that make it easy to train them over the network. Each layer has direct access to the gradients from the loss function and the original input signal, resulting in an implicit deep control. This helps in training deeper network architectures.

AlexNet

AlexNet was much larger than previous CNNs used for computer vision tasks ( e.g. Yann LeCun’s LeNet paper in 1998). It has 60 million parameters and 650,000 neurons.AlexNet consists of 5 Convolutional Layers and 3 Fully Connected Layers.

Multiple Convolutional Kernels (a.k.a filters) extract interesting features in an image. In a single convolutional layer, there are usually many kernels of the same size. For example, the first Conv Layer of AlexNet contains 96 kernels of size 11x11x3. Note the width and height of the kernel are usually the same and the depth is the same as the number of channels.

GoogleNet

The winner of the ILSVRC 2014 competition was Google’s GoogleLeNet (also known as Inception V1). Reached the first 5 error rates of 6.67%! This was very close to the human-level performance. As it turns out, this is actually quite difficult and requires some human training to go beyond the accuracy of GoogLeNets. After several days of training, the human expert (Andrej Karpathy) was able to achieve the first 5 error rates of 5.1% (single model) and 3.6% (community). The network used a CNN inspired by LeNet, called a startup module. Batch normalization, image distortions, and RMSprop used. This module relies on a few small bends to greatly reduce the number of parameters. Their architecture consisted of a 22-fold deep CNN but reduced the number of parameters from 60 million (AlexNet) to 4 million.

Taken from https://developer.ridgerun.com/

SqueezeNet

SqueezeNet is a convolutional neural network that is trained on more than a million images from the ImageNet dataset. The network is 18 layers deep and can classify images into 1000 object categories, such as a keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 227-by-227.

SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques, we are able to compress SqueezeNet to less than 0.5MB (510× smaller than AlexNet).

SqueezeNet has some variants to improve the accuracy of the model. Variants and their accuracies over the ImageNet dataset are given the table below.

Taken from https://www.semanticscholar.org/paper/SqueezeNet%3A-AlexNet-level-accuracy-with-50x-fewer-Iandola-Moskewicz/969fbdcd0717bec06228053788c2ff78bbb4daac

Thank you for reading, see you next week.