Veggie Doneness Model For Smart Oven using CNNs and CUT-GANs

Sam Naji
ILLUMINATION
Published in
6 min readNov 7, 2022

With the help of the recent developments in Machine Learning and Data Science, companies have been trying to generate smart solutions across many different industries to boost customer satisfaction. At Launchpad.ai, we help our clients quickly identify opportunities for AI projects and generate recommendations that maximize customer value. In one such case, a client requested our team in Launchpad to develop a smart oven. Our domain experts were responsible for training a CNN classifier that can detect the level of doneness of veggies while being cooked.

One of the major difficulties in this project was finding in-oven images of vegetable trays. Hence, it was necessary to develop a pipeline that creates synthetic data and inspects it. The project’s outline is as follows:

1- Data collection and cleaning

2- Synthetic data pipeline

3- Test set labeling and clustering

4- Training and testing on labeled data, and synthetic data

5- Conclusion

1. Data Collection and Cleaning

Firstly, our team scraped Google images for keywords “tray of veggies uncooked unbaked raw”, “tray of veggies cooked baked”, “tray of veggies burned overbaked” etc. Moreover, we collected frames from youtube videos that are relevant to our purposes.

  1. Raw — When veggies in the tray are uncooked, yet to be put in the oven, or baking just started.
  2. Medium — Veggies are medium cooked but still not complete. Vegetables are showing signs of baking, and the darkness has started creeping in.
  3. Dark — Well-cooked and can be taken out of the oven.

After collecting data, the images were carefully cleaned, and duplicate images and the noise were removed.

After data cleaning, each team member reviewed the images and ensured that the images were correctly labeled.

Issue:

It was challenging for us to classify some images, especially those belonging to the medium class.

Solution:

We developed a CNN classifier and trained it on our dataset. To avoid overfitting, we reduced the epoch number. After the training was finished, we created the confusion matrix and retrieved its off-diagonal elements and top losses. Then these images were either deleted, moved to different classes, or kept in the same folder. After doing this for several iterations, we ran Image Cleaner by FastAI. We removed duplicates or similar images, and checked the top losses again, and did several rounds of cleaning.

Samples of removed images

Finally, after the dataset is clean, we used it to form a test set.

2a. Synthetic: Image Processing

Due to the scarcity of in-oven images of vegetable trays, making synthetic data was essential.

Each veggie image enters this pipeline, and multiple synthetic images are created:

  1. The edges are trimmed, and images are transposed to insure horizontal orientation.
  2. From a data set of trays, veggies are placed on trays.
  3. From the pizza-doneness data set, the pizza set is segmented, and then the color is transferred to the veggies. Several color transfer algorithms were explored.

After going through the pizza-doneness dataset, we noticed that there is several distinct view-angle that the camera can capture in the oven. We transformed the view angle of the veggies using three methods:

  1. Matlab’s Geometric Transformation Warping

2. OpenCV’s findHomography and warm perspective

3. Fast AI’s symmetric warp

After examining each different set of synthetic images, we agreed to use Fast AI as the warping method

2b. Synthetic Contrastive Unpaired Translation (CUT):

We used the CUT algorithm to create our own datasets. By training CUT on three labels: Raw, Medium, and Dark, we can model the key features of each class.

Training of CUT was based on real cooking sessions divided as follows:

Then using the CUT testing pipeline, using a ( ‘Raw’ ) image, we can generate the two other labels. Some results are shown below:

3. Test Set Labeling and Clustering

Test set consists of real images of cooking sessions. The data contained a lot of noise and was not labeled. Therefore, with the help of domain expertise, we set strict labeling guidelines and used unsupervised learning algorithms as shown below:

4. Training and Testing on Labeled Data, and Synthetic Data

In order to compensate for the lack of data, we created 3 sets of training datasets:

1- Training images

2- Training images cropped in grids instead of rescaling

3- Synthetic images

The results of testing on real images from the cooking session are attached below:

It can be seen that the highest accuracy was achieved by using images of size 512, then synthetic images. The highest accuracy was 80%. Synthetic data gave an uplift of 10% in testing accuracy.

In order to avoid overfitting, we developed a quick testing pipeline constructed for the FastAI library (https://github.com/samnaji/Quick_FastAI_Test). We perform a quick test every 5 epochs and stop training once we reach maximum test accuracy. The graph of test scores vs. epochs is attached below:

5- Conclusion

This project seems to be a simple computer vision task at first glance. However, the task is very specific, so the training data cannot be easily found. First, the team tried to collect data using web scraping and manually labeled the images using tools such as fast AI Image Labeler. Then our team inspected the accuracy of labeling by monitoring the loss for each image and by reviewing with domain experts. Once we exhausted web resources, we constructed a specific training strategy and different synthetic data pipelines. When the training data was completed, our team tested the model on the client’s real test set, where we also tackled overfitting to get the highest accuracy.

Check our website to know more about how Launchpad creates new solutions and insights to deliver next-generation experiences and grow businesses.

--

--