Week 4— DeepNutrition: Be Aware Of Your Macronutrients

Muhammed Aydoğan

Published in

bbm406f19

6 min readDec 26, 2019

Facilitating the dietary assesment process by detecting multiple foods from a single image

Finding The Best Model for Problem

This week I will explain CNN and different CNN models with their advantages and disadvantages.

Understanding Neural Networks:

Neural Network is a learning mechanism modeled loosely after the human brain and designed to recognize patterns and expected to predict pre-recognized patterns. It is not a directly programmable algorithm. It’s a little like human brain but less complicated. Actually it learns by training each new labelled example. By every each example it fixes neuron connections itself, with a method named backpropogation. We expect it to predict the class of an unknown condition we give it. So, prediction part of the algorithm which named forward feeding cannot recognize patterns that was not seen before.

Neural network has layers and nodes in it. It consists of an input layer, many hidden layers as well as an output layer.

https://www.utells.com/The%20Hacker%20News/posts/14211553

Hence pictures has too many pixels (we give every each pixels to nodes of the input layer) that the neural network should process one by one, it’s complexty is always very high for operations like image recognition. So, we cannot use neural networks with big scaled image datasets. At this point CNN comes. CNN reads the picture and gives a smaller sized matrix which is expected to be classified by classic neural network part of CNN.

1. Introduction to Convulational Neural Networks

The general purpose of a Convolutional Neural Network (CNN or ConvNet) is to easily read large pictures. A typical CNN consists of these layers respectively:
1. Convolutional Layer
2. Activation Layer
3. Pooling Layer
4. Flattening Layer
5. Classic Neural Network Layer (or Fully Connected Layer)
First of all we use filters to create map from a picture in Convulational layer. Then activation layer comes. It has been told that this layer increases the accuracy and speed of the algorithm. Pooling Layers’ aim is cancelling the less important features.

https://towardsdatascience.com/the-most-intuitive-and-easiest-guide-for-convolutional-neural-network-3607be47480

As you see in the picture above flattening layer rearranges the matrix which we mentioned as feature map that comes from previous layers. So that we can use the flattened data as input layer of the classic neural network part of the algorithm.

2. Variations of CNN

Considering how neural network and CNN works, we will investigate all CNN models to find which model best fits our problem.

1. LeNet-5

It was built to recognize handwritten and machine printed characters in 1990. It’s the simplest CNN algorithm. Difference from normal CNN is it has 2 convulational layers and 3 fully-connected layers.

https://engmrk.com/lenet-5-a-classic-cnn-architecture/

As you see in picture above kernels with size of 5x5 are used in convulational layers and 2x2 kernels are used in pooling. And fully connected layers size are 84 and 10.

This algorithm is stone age of the neural network history, it is created in 1998. LeNet-5’s accuracy is not enough for us. So, in order to get better accuracy maybe the depth might be a little bigger. Lets try bigger kernel sized method, which we call AlexNet.

2. AlexNet

https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d#c5a6

As you can see in this picture there are 5 convulational layers with some pooling layers and 3 fully connected layers. Also ReLu Activation is used in this model. AlexNet has a bit more layers than LeNet, still we didn’t find what we want, lets dive deeper.

From now on we will see serious models.

3. ResNet

Residual Neural Network (or ResNet) was the winner of 2015 The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with %3.57 less error rate than the second model. In this competition ResNet used 152 layers. (See concluison at the bottom of the page for more information.)

ResNet is more complicated than the previous ones I told.

Resnet’s accuracy is better with lower number of layers as well as VGGNet which I will explain later. We need lower complexity and better accuracy.

Error rates depending on layers from this source

Considering the error-rate table given above ResNet is good. But we should still search for other CNN models.

4. VGGNet

VGGNet is a modern CNN model which is very popular nowadays. It was the first runner-up in the global competition The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) at 2014.

Noteworthy VGGNet variations from this source

As you can see untill 19 layers, error-rate decreases while layer numbers are increasing. The research results says that there are two VGGNet variations considerable for us: VGG-16 and VGG-19.

VGG-16 including 13 convulational layers and 3 fully connected layers has 138

VGG-19 including 16 convulational layers and 3 fully connected layers has 144 million parameters has slightly worse accuracy than the VGG-16.

5. GoogleNet (or Inception)

This is the model with great complexity. GoogleNet architecture won ImageNet 2014. It has Inception layers. It was very close to human brain’s accuracy.

An inception layer is a concentenation of results of a filtering a layer with 3 different filter. To make it easier you can check out the picture below.

Or maybe we can add max pooling to the inception:

GoogLeNet incarnation of the Inception architecture from this source

Max pooling, avg pooling and Relu are used in this model.

GoogleNet uses 9 linearly stacked inception module. GoogleNet’s developers inspired by LeNet. This model has a severel very small kernels which amazingly reduces the number of parameters (weights).

GoogleNet includes 22 deep CNN layers little bit like AlexNet. But it only has (4 million parameters), although AlexNet has 60 million parameters.

You can may find a deep information about GoogleNet’s inception system from this article ([12]).

Conclusion

VGGNet’s problem is it has no inception, There is no such relation with previous layers like GoogleNet’s. Also complexities are very high in both.

ResNet as the winner of ILSVRC 2014 seems like an old school thing.

It’s hard to choose one model. Still I cannot say one of them is the best. So, to clearky understand this, we will implement and test VGGNet, GoogleNet and maybe ResNet, next week, with our food dataset (We mentioned it in our earlier blogs 1, 2, 3) or mnist dataset although mnist is very basic.

REFERENCES