[Week4 — Eat & Count]

Published in

bbm406f16

2 min readDec 25, 2016

Last week, we talked about deep learning frameworks, their advantages and disadvantages then we tried to decide which one is suitable for our project and which one to choose. As a result, we decided to use Tensorflow Framework for our project and we implement a simple network by using Tensorflow’s own tutorial and tested it on MNIST handwritten digits data set. This week we will be talking about the experiment results of testing this network on our Food-101 data set.

The first network that we implemented consists of three layers and one fully-connected layer. On each layer, we have one convolutional layer and one pooling layer. We used 5x5 sized filters on each layer with 2 strides. We used 2x2 paddings on data. On the first layer we had 32 channels, on the second we had 224 and on the third we had 64 channels. Then we connected it with the fully-connected layer. This is actually a very simple network and we made first experiments with it.

We choose our first class size as 3 and tested this network on 3 food classes. At the beginning , we get about %60 accuracy on these 3 classes. Then we try to find a way to increase this accuracy, we implement some preprocessing approaches in order to increase accuracy. [1] First we used zero-center approach and normalize it, and we tried another pre-processing approach PCA wightening. After applying these techniques, our accuracy increases to about %80. After getting this result, we decided to pass next step with 10 different classes.

[2] Alexnet Layers

When we directly use this network, we got some poor results with it. The accuracy was about %20 on 10 different food classes because the network architecture was not enough to classify well. Then we implemented the AlexNet’s layers on our network. [2] It has 5 convolutional networks and 3 fully-connected layers. On the first layer, we have 96 channels and 11x11 sized filters with stride 4x4. On the second layer , 256 channels and 5x5 sized filters with stride 1x1. Third and fourth layers have 384 channels and 3x3 sized filters with stride 1 and last layer we have 256 channels and 3x3 sized filters. Then we connected it to 3 full-connected layers. After testing this, we get about %30 accuracy on 10 classes.

On next blogs, we will talk about what we do to increase the accuracy and how we extract our second data set about calories of foods.

References

[1] http://www.kdnuggets.com/2016/03/must-know-tips-deep-learning-part-1.html

[2]https://www.researchgate.net/publication/289928157_A_Taxonomy_of_Deep_Convolutional_Neural_Nets_for_Computer_Vision?_sg=LvQQZcxmoPQLhO_wIib3UwXEvnzOsR2VbizpfwArrquEzDjCsny6wVff2cU2VwYmWcaXi8DVQJ45obqwFIXWdQ

[Week4 — Eat & Count]

Written by Eat & Count