Note: This post requires knowledge of Python, Deep Learning, Image preprocessing, and classification model evaluation. If you are not familiar with these concepts, please feel free to leave this post. Thank you…
Namste Everyone… In this post, we will classify the Indian food images. India is known for its multi-culturalism. This multi-culturalism also influences Indian food. We have lots of variety in our Indian food. Food varies inter-state and intra-state across India. We will apply deep learning algorithms to identify these food items. Our objective is to make you familiar with deep classification networks with catchy example. This post will be far more interesting for food lovers. In addition, we also apply the transfer learning technique to improve the classification performance.
We are using Food20 dataset for the experimentation. The dataset is freely available at Kaggle platform: dataset link. Download the zip file and extract it. Keep the extracted folder parallel to the python notebook file. The dataset contains images of 20 different Indian food items, and has 100 sample images for each food item. Data is already stored in train-test (train-validation) split format. The train-test ratio is 70:30. The images have varying resolution in the range of 200 x 150 to 5760 x 3840 pixels.
We will use following python’s inbuilt and third party libraries to perform classification experiments.
Data loading and Preprocessing:
First, we load all the images from the secondary storage. Next, we reshape each image into a 256 x 256 shape. We also perform the normalization of pixel values. The following code snippet is related to the preprocessing part.
x_train and y_train represent the image data and labels for the training part. x_val and y_val represent the image data and labels for the validation or testing part. Here, we are using the validation part for the testing as well. You can create a separate test set.
Convolution Neural Network Model:
After data preprocessing, we create our CNN model. Here, I am not going to discuss the basics of Convolution NN. If you are a beginner in the field of deep learning, please go through the following links to learn about convolution: 1. convolution Neural Networks by Andrew Ng, 2. Introduction to CNN, How convolution layers work. The following code block represents our CNN model.
We apply series of 2-D convolutions and pooling operations on the input image data. Next, we flatten the output of convolution and pooling operations to create a dense network. Finally, we add one SoftMax layer for the final classification with 20 nodes (we have 20 types of food images). You can experiment with the filter size, number of convolution and pooling layers. The following image represents the count of parameters to be learned at each layer of our CNN model.
We use Adam optimizer to update the parameter values with the backpropagation. For a detailed study of Adam optimizer and backpropagation, follow these links: Adam Optimizer, Backpropagation. We train our CNN model for 200 epochs. You repeat the experiment with a different number of epochs.
After training the CNN model, we plot the training and validation loss. We also analyze the accuracy score for the epoch. Following code block plots the required plot for the analysis.
After 20 epochs, training loss and accuracy remains constant with minor changes. On the other side, validation loss decreases up to ten epochs but starts increasing after ten epochs. Validation accuracy changes in a small range after 10 epochs.
The above code block tests the trained CNN model and outputs the performance in form of Precision, Recall, and F1-score. For more details about metrics, follow these links: Wikipidia, Accuracy_Precision_Recall. The following table shows the category-wise performance of CNN model. We got an average of 54% F1-score. Our model performed worse for the butternaan and dosa categories. We can manually analyze both categories’ images to identify the reason, which will lead us to more useful insights.
A confusion matrix is a table representation of the classification model’s performance. Please don’t go on its name; It is not confusing. For more detail, please follow this link: Wikipedia. In simple words, in the ideal confusion matrix, all the non-diagonal values must be zero. From the following confusion matrix, we can analyze the misclassified images. For example, most of the butternaan images are misclassified into biriyani, dosa, idly, gulab jamun.
Using Transfer Learning (TL) for Improvement:
We have achieved an average performance with our simple CNN model. Let’s use the power of TL to improve the performance. In TL, we store the knowledge gained while solving one problem and apply it to a different but related problem. TL helps us when we have a very small amount of data for training. In our case, we have only 70 images of each food item for training. TL could be useful to improve the classification performance. Here, we are using pre-trained weights of a model, trained on the ImageNet dataset. We replace the input and output layers of the pre-trained model with our input and output layers.
We will not retrain the existing model weights. We will only train the last layer. You can see the count of learnable and non-learnable parameter count in the following image:
After training the TL-based model, we compare our CNN model’s training performance with the TL-based model. Following code block plots the training accuracy and loss values with respect to epochs for the simple CNN and TL-based models.
TL-based model scores higher training accuracy for the validation part. For TL based model, training accuracy increases gradually, unlike for the simple CNN model. Loss values decrease continuously for the TL-based model.
The above code block tests the TL based model and prints the classification performance. The TL based model has achieved an F1-score of 84%. TL-based model is able to classify butternaan and dosa images more efficiently compared to the simple CNN model.
If we compare this confusion matrix with the previous one, we have more values on the diagonal. Only the bisibelebath category still has a large number of misclassified images.
Finally, we have completed our Indian food image classification task successfully. We have seen the power of Transfer learning in this experiment. You can replicate this experiment and try to play with the model hyper-parameters. We have provided the notebook file Here: GitHub link. Please install the required libraries before executing the code. I will suggest you use the Google Collaboratory platform. In Google Collaboratory, you won’t need to install any library, and you get will Free GPU also. There is another food image dataset available at: Tensorflow website (food101). This dataset is more versatile and bigger than Food20 dataset. You can try the above experiments with the food101 dataset to gain more useful insights.
Thank you for reading this post…