Exploring Few-Shot Learning Methods: A Case Study on Tomato Disease Detection

Joshua Sirusstara
Python’s Gurus
Published in
10 min readJun 11, 2024

Introduction

Tomatoes are one of the most popular crops worldwide, with 170,750 kilotons produced in 2014 Dubey et al. [2020]. However, like all crops, tomatoes are susceptible to various diseases that greatly reduce their yield and quality. These diseases are a huge threat to food security. Early detection and identification of these diseases are important in determining the correct treatment and environment for preventing the disease’s spread.

This method includes monitoring and analyzing plant parts, mainly their fruits, stems, and leaves, for any phonological changes. This analysis requires considerable processing and technological resources. In addition, current technology tools are not yet equal to human experts Durmu ̧s et al. [2017]. However, manually identifying the disease is a tedious, time-consuming, and labor-intensive activity as there is a large and diverse variety of diseases Brahimi et al. [2017a]. Hence, an expert-level performance model in an automated environment with sufficient resources is required to recognize crop diseases.

As artificial intelligence grows in the field of computer vision, many of its application in agriculture has shown brilliant results within the confined space of image data and has mitigated several crop disease recognition problems Rangarajan et al. [2018], Zhang et al. [2018], Tm et al. [2018], Agarwal et al. [2020a]

Previous Works

Multiple research done on this field commonly used a CNN based pre-trained models such as SqueezeNet, AlexNet, and VGG16. This however requires time and extensive data resource for tuning or training. Other drawbacks from a convolutional network alone is the insufficient ability to perform well in a dynamic environment, where our leaf disease recognition is going to be applied.

To eliminate the time for training or tuning on different environment we result into few shot learning methods. Few shot learning requires little to no training sample which consist of two main structure a feature extractor backbone and a similarity module. We won’t be delving deeply on the concept but when we refer to a few shot learning method we usually refers to the similarity module used for inferencing.

Various previous works have experimented on different few-shot methods on plant disease recognition tasks. To mention a few :

  • Metric based method such as siamese network, triplet loss Argüeso et al. [2020];
  • Distance based methods such as euclidian distance Yang Li and Yang [2021], mahalanobis distance metric Nuthalapati and Tunga [2021], MatchingNet Chen et al. [2021];
  • Prototypical based network such as ProtoNet Chen et al. [2021];
  • Adaptive Network such as LFM-CNAPS Chen et al. [2021].

The main idea shared between these methods act out as various strategies to fill in the knowledge gap due to the scarcity of the dataset by transfering some form of representations of the examples to the inferencing phase. From the previous works we can see that the meta-adaptive-learning method yields better result in handling few shot learning task.

These methods can also be categorized based on type of inference they use, which is inductive or transductive. Inductive aims to learn general rule, treating the query data separately and leveraging solely on the support set or the mean (prototype) of it to avoid overfitting, similar to classical machine learning practice. Transductive, in the other hand, take into account not just the supporting set but also the query set information and relation altogether creating a more rich representation model but at the cost of resources. Based on the previous explanation of both the nature of inductive and transductive method it can be infered that the prediction phase of inductive method is done episodically resembling real test scenario while transductive process the entire batch of query data simultaneously yielding better result in cost of more computional operation.

Dataset

The dataset used in this case study is from Kaggle Tomato Leaves Dataset, which is a selected and augmented dataset from multiple sources with PlantVillage dataset as the main source Huang and Chang [2020] and additional data from Geetharamani and Pandian [2019]

Table 1: dataset label distribution

The dataset comprises of over 32,535 images of tomato leaves with 10 diseases and 1 healthy class (Table 1). The images collected include photos from both lab scenes and in-the-wild scenes. Both sources applied augmentation such as clockwise rotation, horizontal and vertical mirroring, reducing and increasing image brightness, and more for the selected PlantVillage dataset and applied flipping, Gamma correction, noise injection, PCA color augmentation, rotation, and Scaling for the additional data.

Figure 1: data sample

From quick sampling, as seen in Figure 1, traditional techniques such as edge detection, SLIC segmentation, and thresholding prove insufficient for capturing the complexity and subtlety of the features necessary for correct classification.

The dataset was split into train and validation sets. We also applied simple augmentation, such as flipping and rotation on the training set to avoid image leaks from the duplicated image. We followed basic procedures by using the first 6 data as a base, 2 classes for validation, and 3 classes for testing. Then, we applied a task sampler with an n-shot to episodically evaluate the evaluation sets.

Modelling

The model used for this experiment is the EasyFSL module by Sicara. It uses the Pytorch library, which includes predefined methods, utilities, and backbone models. Each method requires a trained backbone model, which utilizes the transferred embedding from the backbone output to create an informative metric for inferencing a new target label.

The model that we will using for this case study is Relation Network, Bias Diminishing-Cosin Similarity Prototypical Network (BD-CSPN), and Transductive Information maximization (TIM). Each framework will represent different type of combination of few shot learning methods :

  • Relation Network, metric-based using inductive inferencing
  • BD-CSPN, prototypical-based with metric-based bias diminishing approach using transductive inferencing
  • TIM, adaptive-based, maximizes the mutual information between the predicted labels and the input data with transductive inferencing

Result

Both TIM and BD-CSPN trained the backbone on the base set for 50 epochs with a batch size of 128. The Relation Network trained with 50 epochs episodically with 500 tasks per epoch, and each epoch consisting of 10 queries sampled from 5 randomly selected classes and 5 randomly selected samples. The models were then validated and tested on n-shot 2-way tasks of the validation and test set. The validation score was obtained by averaging the validation accuracy across epochs with 200 tasks per epoch and the test score from the test set accuracy across 1000 tasks, where each task contained 10 queries of 3-way n-shot samples. Table 2 indicates that as the number of shots grows and the number of classes decreases, the models have shown increased performance, this follows the behaviour in the previous studies

Table 2: Methods accuracy

Between TIM and BD-CSPN score, we can see that TIM score is superior to BD-CSPN score which is also supported by the result in TIM’s original paper Boudiaf et al. [2020]. We did further experiment between the two methods using five folds on different classes split using different backbones as shown in Table 3. The result shows that TIM to be more superior than BD-CSPN in almost every different validation class folds.

Table 3: Five fold accuracy average with three way task

We also found that using bigger backbone doesn’t necessarily result in a better outcome as seen in Table 3, when using bigger backbone with one shot, the model is more struggling in creating the correct latent space than using ResNet12. We tried increasing the backbone parameter by using ResNet50 and found that the score degrades both in one shot and five shot tasks, this also suggest that increasing parameter does not necessarily guarantee better performance in few shot transductive condition.

Other than that from the T-SNE plot of the query and prototype embedding between TIM and BD-CSPN in Figure 3, can be seen that TIM’s prototype embedding, denoted by ‘x’, have a more confident or spread out position compare to BD-CSPN’s prototype. We investigate the robustness of the two methods by evaluating on the same test data set with five different task sampler, calculate the variance of each model and test if the difference is statically significant with Levene’s test. We found that TIM’s outcome is more robust than BD-CSPN’s in multiple different task sample.

Figure 3 : T-SNE mbeddings plot of ten query task and three prototype comparison.

Based on the experiment conducted in the Kaggle Notebook environment using GPU T4x2 as the accelerator, Table 4 illustrates the inference time for each method. The results demonstrate that TIM is approximately twice as slow compared to BD-CSPN and Relational Network, as stated in the original paper where TIM requires careful optimization. Between inductive and transductive method itself however, we found little to no difference on the time needed in handling the same task.

Table 4: Average inference time for the 3-way 5-shot test set

From Figure 4 we can also see that most of the miss-classified images contain noises or unusual environments that have made it challenging for the model to extract. A few possible ways to overcome this are :

  • Use a more contextual backbone model for feature level extraction.
  • Add multi-modal information by adding text description of the environment (e.g., temperature or humidity) as suggested by Wang et al. [2021].
  • Implement image pre-processing techniques such as masking or segmenting can also be benefit in removing unwanted noise.
Figure 4: Image prediction 3 way 5 shot TIM

Conclusion

In this experiment, we investigated the few-shot learning methods for image classification of diseased tomato leaves. We used Tomato Disease datasets from Kaggle, which uses PlantVillage dataset as the main source, and applied various few-shot learning methods: the Relation Network, BD-CSPN, and TIM. We compared the performance of these three different methods and found that methods with transductive learning performed better than methods with inductive learning, with TIM having the highest accuracy of 81% on the 3-way 5-shot test set. We found that BD-CSPN was the fastest of the methods for inference time, while TIM had the slowest inferencing time out of the methods.

Overall, we recommend BD-CSPN as a more applicable method with an accuracy of 79% on the test set while maintaining a relatively good inferencing time. Other practical implications we proposed, includes utilizing a more contextual backbone model, integrating multi-modal information, and implementing image pre-processing techniques to effectively eliminate unwanted noise.

References

  • Ankit Dubey et al. Agricultural plant disease detection and identification. International Journal of Electrical Engineering
    and Technology, 11(3), 2020.
  • Halil Durmu ̧s, Ece Olcay Güne ̧s, and Mürvet Kırcı. Disease detection on the leaves of the tomato plants by using deep learning. In 2017 6th International Conference on Agro-Geoinformatics, pages 1–5, 2017. doi:10.1109/Agro-Geoinformatics.2017.8047016.
  • Mohammed Brahimi, Kamel Boukhalfa, and Abdelouahab Moussaoui. Deep learning for tomato dis-eases: Classification and symptoms visualization. Applied Artificial Intelligence, 31(4):299–315, 2017a.
    doi:10.1080/08839514.2017.1315516.
  • Aravind Krishnaswamy Rangarajan, Raja Purushothaman, and Aniirudh Ramesh. Tomato crop disease classification using pre-trained deep learning algorithm. Procedia Computer Science, 133:1040–1047, 2018. ISSN 1877–0509. doi:https://doi.org/10.1016/j.procs.2018.07.070. URL https://www.sciencedirect.com/science/article/pii/S1877050918310159. International Conference on Robotics and Smart Manufacturing (RoSMa2018).
  • Keke Zhang, Qiufeng Wu, Anwang Liu, Xiangyan Meng, et al. Can deep learning identify tomato leaf disease? Advances in multimedia, 2018, 2018.
  • Prajwala Tm, Alla Pranathi, Kandiraju SaiAshritha, Nagaratna B Chittaragi, and Shashidhar G Koolagudi. Tomato leaf disease detection using convolutional neural networks. In 2018 eleventh international conference on contemporary computing (IC3), pages 1–5. IEEE, 2018.
  • Mohit Agarwal, Abhishek Singh, Siddhartha Arjaria, Amit Sinha, and Suneet Gupta. Toled: Tomato leaf disease detection using convolution neural network. Procedia Computer Science, 167:293–301, 2020a.
  • Mohammed Brahimi, Kamel Boukhalfa, and Abdelouahab Moussaoui. Deep learning for tomato diseases: classification and symptoms visualization. Applied Artificial Intelligence, 31(4):299–315, 2017b.
  • Mohit Agarwal, Suneet Kr. Gupta, and Kanad K. Biswas. Development of efficient cnn model for tomato crop disease identification. Sustain. Comput. Informatics Syst., 28:100407, 2020b.
  • Sai Vidyaranya Nuthalapati and Anirudh Tunga. Multi-domain few-shot learning and dataset for agricultural applications. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1399–1408, 2021
  • David Argüeso, Artzai Picon, Unai Irusta, Alfonso Medela, Miguel G San-Emeterio, Arantza Bereciartua, and Aitor Alvarez-Gila. Few-shot learning approach for plant disease classification using images taken in the field. Computers and Electronics in Agriculture, 175:105542, 2020.
  • Yang Li and Jiachen Yang. Meta-learning baselines and database for few-shot classification in agriculture. Computers and Electronics in Agriculture, 182:106055, 2021.
  • Liangzhe Chen, Xiaohui Cui, and Wei Li. Meta-learning for few-shot plant disease detection. Foods, 10(10):2441, 2021.
  • Malik Boudiaf, Imtiaz Ziko, Jérôme Rony, José Dolz, Pablo Piantanida, and Ismail Ben Ayed. Information maximization for few-shot learning. Advances in Neural Information Processing Systems, 33:2445–2457, 2020.
  • Chunshan Wang, Ji Zhou, Chunjiang Zhao, Jiuxi Li, Guifa Teng, and Huarui Wu. Few-shot vegetable disease recognition model based on image text collaborative representation learning. Computers and Electronics in Agriculture, 184:106098, 2021.
  • Mei-Ling Huang and Ya-Han Chang. Dataset of tomato leaves. Mendeley Data, 1, 2020.
  • G Geetharamani and Arun Pandian. Identification of plant leaf diseases using a nine-layer deep convolutional neural network. Computers & Electrical Engineering, 76:323–338, 2019.12

Python’s Gurus🚀

Thank you for being a part of the Python’s Gurus community!

Before you go:

  • Be sure to clap x50 time and follow the writer ️👏️️
  • Follow us: Newsletter
  • Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.

--

--