[Week 6— SeeFood]

Okan ALAN
bbm406f18
Published in
4 min readJan 7, 2019

Theme: Food Calorie Estimation

Team Members: Okan ALAN Gökberk Şahin Emre Yazıcı

This is our sixth blog post. Let’s look at what we did last week.

This week we tried again to train an object detection model on our dataset using Tensorflow Object Detection API. We used the Faster R-CNN Inception model from the Tensorflow models GitHub repository. This time we stopped the training step at loos value around 0.2 and we have achieved satisfactory results. You can see this result in figure 1. Last two week, we detected only one object every time but now we can detect both of them at the same time and the results are good.

Figure 1: Detected objects

We need more trained model to compare the results. Therefore we are training other models. We will share comparison results next week.

We did not only training and detection this week. We have been starting the next stage that is to estimate the volume of foods. We need some features for estimation. We extract them after some subprocess on the image. The subprocesses are detection, grabCut.

How are these features created? Firstly, we detect objects. Detection gives us the bounding boxes for each object in the image. After we learned the bounding boxes we send images grabCut algorithm. We use GrapCut algorithm to delete background pixels. Deleting is replacing non-object pixels with black in figure 2. We will use object pixels to estimate volume. After these processings, we obtain width, height, foreground pixels of the object. When we estimate the volume we need 2 images, one from the side and one from the top. For this reason, we apply subprocess to 2 images.

Figure 2: GrapCut example

We have real weight and volume. We mix real features and our extracted features then we send these features to the next step.

Once we extracted all the features we want to see the relationship between features to find potential correlations. So we created a correlation heat map which looks like in figure 3.

Figure 3: Heat Map

As we can see there is a high correlation between number of foreground pixels and food height/weight. This is obviously natural since more area means more pixels. However we can’t exclude this feature just because they’re highly correlated. Actually excluding one of these features will result in worse test accuracy because the formula we’re trying to mimic with this decision tree utilizes both of these features. By using this observation, we can further try other combinations of these features such as multiplying width and height and using as one feature, getting the ratio of the coin and using all coin features as one and more. But just for curiosity we feed the all features to the random forest just to see how well it will learn. We got better results than expected. Average error was 33 which is relatively good compared to the results of our baseline work considering we haven’t done any feature engineering or any other optimization. Authors of our baseline work got mean error of 20 when estimating the volume. Here is one of our decision tree in the random forest in figure 4.

Figure 4: Decision Tree

In this random forest maximum depth was 10. Increasing the depth was causing overfitting and decreasing our test accuracy. Also decreasing the depth was causing underfitting which again decreasing our test accuracy.

Next week we’ll optimize our decision tree model, try to come up with the correct combination of features and train other models like MLP and kNN to see the performance. Then based on other models performance we can use ensemble learning to boost our accuracy even more.

References:

https://docs.opencv.org/3.4/d8/d83/tutorial_py_grabcut.html

--

--