[Week2 — Eat & Count]

In this week we researched convolutional neural networks which we will be use it in our food recognition part of the project. We examined the differences between convolutional neural networks and regular neural networks. And we briefly explained the layers in ConvNets.

In this article we use the notes from Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition [1] and the answer from Quora to “What is a convolutional neural network” by Jie Xu [2].

Next week we will try a framework to employ our first experiments. In our next blog we will discuss why we choose that framework and first architecture of our network.

Difference Between ConvNets and Regular Nets

Convolutional neural networks are a type of neural network that have been very widely used in areas such as image recognition, video analysis, natural language etc. And ConvNets have been very successful in these areas.

Figure 1. Multi-Layer Perceptron [3]

It will be good to compare convolutional neural networks with a regular neural networks. A neural network/multi-layer perceptron is consists of input/output layers and a series of hidden layers. Each layer is fully connected to the previous layer. More precisely each neuron in a layer is connected to all neurons in the previous layer.

This model leaves us with some problems. First, Regular Neural Nets do not scale well. If we consider an image has a size of [256x256x3], each neuron in the hidden layer will have 256*256*3=196,608 connections with each pixel in the input image. If we want to increase number of neurons we must also consider the increase in the number of parameters. This full connectivity is wasteful and the huge number of parameters leads us to overfitting.

Regular Nets also ignores the correlation between nearby features as a result of full connectivity. For example in images nearby pixels are more strongly correlated compared to distant pixels. Also it is not robust to image transformations. Any subtle change in scale or position from the input layer would produce significant changes in following layers.

ConvNet Architecture

Figure 2. Local Connectivity [4]

The architecture is very similar to regular nets. There are also layers like in the regular nets. The difference is becomes in the structure of layers. There are three main types of layers: Convolutional Layer, Pooling Layer, and Fully-Connected Layer.

  • Convolutional Layer: In this layer we apply filters by using local connectivity and we have to learn the parameters of those filters. Each filter slides over the width and height then we get 2d output. By applying filters in sequence we get 3 dimensional output volumes. One important thing is that a filter must spatially extend the received input’s depth.
  • Pooling Layer: Implementation of this layer is similar to convolutional layer. In this case the filter we are using is fixed function that means there are no learnable parameters. Purpose of it is reducing the size of an input and parameters to learn.
  • Fully-Connected Layer: Generally last layers in the network are fully connected. This layer is used for compute the class scores.
Figure 3. Representation of applying convolutional layers [5]

References

[1] Convolutional Neural Networks (CNNs / ConvNets) http://cs231n.github.io/convolutional-networks/

[2]What is a convolutional neural network? by Jie Xu https://www.quora.com/What-is-a-convolutional-neural-network/answer/Jie-Xu-5?srid=Kn9G

[3] Multilayer Perceptron http://deeplearning.net/tutorial/mlp.html

[4] Convolutional Neural Networks (LeNet) http://deeplearning.net/tutorial/lenet.html

[5]Convolutional Neural Networks (CNNs / ConvNets) http://cs231n.github.io/convolutional-networks/

Like what you read? Give Eat & Count a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.