Food Image Classification with Improved MobileNet Architecture and Data Augmentation

Sirawan Phiphiphatphaisit and Olarik Surinta

Olarik Surinta
MISL
3 min readMay 20, 2020

--

Abstract

The real-world food image is a challenging problem for food image classification, because food images can be captured from different perspective and patterns. Also, many objects can appear in the image, not just foods. To recognize food images, in this paper, we propose a modified MobileNet architecture that is applies the global average pooling layers to avoid overfitting the food images, batch normalization, rectified linear unit, dropout layers, and the last layer is softmax. The state-of-the-art and the proposed MobileNet architectures are trained according to the fine-tuned model. The experimental results show that the proposed version of the MobileNet architecture achieves significantly higher accuracies than the original MobileNet architecture. The proposed MobileNet architecture significantly outperforms other architectures when the data augmentation techniques are combined.

Keywords — Food Image classification; Convolutional Neural Network; MobileNet Architecture; Data Augmentation.

Read articlehttps://dl.acm.org/doi/abs/10.1145/3388176.3388179

Conclusion

In this paper, we used the state-of-the-art MobileNet architecture on the food image dataset. We also described a MobileNet architecture, which was designed to address the overfitting problem. In this proposed MobileNet architecture, the number of parameters is decreased by applying the global average pooling (GAP) layers. Moreover, the batch normalization (BN), rectified linear unit (ReLU), and dropout layers are combined. Also, the last layer is the softmax. In addition, the data augmentation techniques are computed before transferring to the training process.

From the experimental results, to the best of our knowledge, we trained the MobileNet architecture according to the fine-tuned model. The proposed MobileNet architecture is competitive when compared to the original MobileNet architecture on the ETH food- 101 dataset. We also demonstrated the impact of the data augmentation techniques; rotation, shift, flip, shear, zoom, and crop when implemented before assigning to the proposed MobileNet architecture to process. The best performance achieved when the combination of the various data augmentation techniques and the proposed MobileNet architecture.

In future work, we plan to construct the deep ensemble convolutional neural network (CNN) architectures, which are a combination of the state-of-the-art deep CNN architectures. We are interested in extracting the feature vector from the convolutional layers which may work better than individual deep CNN architecture.

--

--