Which AI is hungrier for food?

Key words: nutrition detection, food image recognition, artificial intelligence, deep learning, machine learning, food label, food apps

Purpose

Background

In the last study, we had collected images from internet sources and analyzed them by using leading image recognition services: Google Vision, Amazon Rekognition, Microsoft Computer Vision and Instagaze. We concluded that Instagaze had the highest image precision and label precision followed by Google Vision compared to other image recognition services.

Given the findings of our previous study, we tested Google Vision API with an image of cheese pizza taken from smartphone. Surprisingly, Google Vision, was unable to precisely detect a cheese pizza image taken from a smartphone when it had correctly recognized an extremely similar image from an internet source.

Figure 1: Image on the left is from internet, and image on the right is captured from smartphone showing corresponding labels generated from Google Vision for a slice of pizza

Food image recognition is challenging due to the nature of food items. The advancements in the food image label detection has been scanty. Foods are typically deformable objects, which makes the process of defining their structure difficult. Furthermore, there is only limited information that can be gained from food images; such as food color, food is well-lit and food’s density. Despite these obstacles, deep neural networks have outperformed traditional approaches but can become biased and unreliable in real world if trained on professionally curated images.

To get a deeper insight, we tested 100 food images taken from smartphone and benchmarked: Amazon Rekognition, Google Vision, Clarifai, and Instagaze. Clarifai and Instagaze both have specialized deep learning “Food” model that recognizes food items in images.

Experiment & Procedure

Figure 2: Personal images collected using smartphone. Strawberry cupcake (upper left corner), avocado egg toast (on the right), vegetable pasta topped with cheese (bottom left corner).

For each image, the machine learning services returned a set of labels with their respective confidence scores, original image URL and correct label which were stored into separate datasets. The datasets along with the source code can be found here.

Data Analysis

  • Acceptable Label Categorization
  • Label Precision
  • Image Precision

Acceptable Label Categorization

Figure 3: Acceptable and Not acceptable labels for Chicken Pho

Label Precision

Figure 4: Acceptable Labels vs Not acceptable Labels across all services

Label precision was calculated as below:

Total Label Precision = Total acceptable labels per image/Total labels generated

Figure 5: Label precision across all services

We found that Instagaze had the highest label precision of 14.30% and Amazon Rekognition had the lowest image precision of 5.75%. Instagaze generated the maximum correct labels followed by Google Vision, Clarifai, and Amazon Rekognition. The correct label generation is highly important for nutritional information and dietary management.

Image Precision

Image Precision = Total images detected with an acceptable label/ Total number of images

Figure 6 : Images with acceptable and Not acceptable labels across all services. *Note: Google Vision and Instagaze were unable to detect one image.
Figure 7: Image Precision across all services

Across the four benchmarked image recognition technologies, Instagaze had the highest image precision of 85% and Amazon Rekognition had the lowest image precision of 39%. Precise image recognition is immensely helpful for creating workout plans, encouraging healthy eating and food nutrition calculations.

Conclusion

Zeta Metrics is not about us, it is about what you and us can achieve together. #SaaS #MachineLearning