Leaf Disease Dataset Image Classifier

17 min readApr 25, 2023

Image src.: https://www.kaggle.com/datasets/aryashah2k/mango-leaf-disease-dataset

GitHub : https://github.com/G-G-Thorat/Leaf_Image_Classification_DM_Final/blob/main/Img_classifier_GT_DM_to_submit.ipynb

Homepage : https://g-g-thorat.github.io/

Preface

In this blog post, we will guide you through the process of creating an image classification application for a leaf disease dataset. Specifically, we will be using the Leaf Disease Images dataset [5] available on Kaggle, which contains over 4000 images of eight different types of diseases : Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Powdery Mildew, Sooty Mould, and Healthy all in different categories. To build our image classifier, we will utilize various machine learning models, including Convolutional Neural Networks (CNN), Support Vector Machines (SVM), k-Nearest Neighbors (kNN), Random Forest Classifier, Decision Tree Classifier and Naive Bayes Classifier.

To see a demo of this app in action, check out my Colab notebook and watch the video on YouTube. So, let’s begin with an introduction to the topic.

Prologue

In recent years, one of the most fascinating areas of research in the field of computer science has been image classification. The goal of image classification is to assign a suitable label to an image based on its content. This technology has numerous use cases across various industries such as e-commerce, healthcare, marketing, and many others. Image classification can be used to identify objects in images and classify them for later use.

Src: https://medium.com/@gauravthorat1998/facial-emotion-expressions-aab729bf69c0

Image classification technology can also be used in the automotive industry for identifying objects in the road and classifying them for autonomous driving systems. For instance, it can be used to distinguish between pedestrians, other vehicles, and obstacles in real-time images captured by a car’s cameras, enabling the car’s software to make appropriate decisions and avoid accidents.

In this blog post, we will be exploring the potential of five different machine learning models — CNN, kNN, SVM, Random Forest Classifier, and Decision Trees — for our image classification application.

So, let’s dive deep into these models and see how they can help us achieve our objectives.

What does Convolutional Neural Network (CNN) means ?

Convolutional neural networks (CNN) are a type of artificial neural network (ANN) used most frequently in deep learning to interpret visual data as stated in [8][11].

The CNN has 3 main layers — convolution layer, pooling layer and fully connected layer as stated in [10].

Convolution Layer:

The neurons within a convolutional layer execute the convolution operation on the inputs they are given. The usual hyper parameters associated with a convolutional layer are the Filter and Stride.

2. Pooling Layer:

By implementing pooling layers, it is possible to decrease the input size, which results in faster processing and analysis of the data. Typically, convolutional layers are succeeded by pooling layers, which help reduce the spatial dimensions (width and height) of the input, leading to a reduction in computational requirements. The hyper parameters associated with a pooling layer are the Stride, Max or average pooling, and Filter size.

3. Fully Connected Layer:

Fully connected layers are named as such because they connect each neuron in one layer to every neuron in the next layer. In these layers, every input dimension and output dimension work in tandem, resulting in complete inter connectivity between the two layers.[11]

What is Naive Bayes Classification ?

The Naive Bayes classification family employs Bayes’ Theorem and probability theory to forecast the appropriate tag for a given text, such as a news article or a customer review, as noted in [11]. Since they are probabilistic, these algorithms calculate the probabilities of each tag for a given text and assign the tag with the highest probability [11]. Bayes’ Theorem is utilized to determine these probabilities.[12]

Src: https://towardsdatascience.com/logic-and-implementation-of-a-spam-filter-machine-learning-algorithm-a508fb9547bd

One example of NBC (Naive Bayes Classifier) in action is spam email filtering. In this case, the classifier is trained on a dataset of emails that have been labeled as either spam or not spam (ham).

The algorithm works by calculating the probability that a new email is spam based on the occurrence of certain words or features within the email. For example, if the word “free” appears frequently in known spam emails, the classifier will give a high probability that an email containing the word “free” is also spam.

The “naive” part of the algorithm comes from the assumption that each feature is independent of the others, which simplifies the probability calculations. Despite this simplification, NBC is a very effective algorithm for text classification tasks like spam filtering, sentiment analysis, and document categorization.

Then,

What is Support Vector Machines (SVM) ?

Support Vector Machines (SVM) is a powerful tool that can be used for both classification and regression problems, as highlighted in [10]. It can handle multiple categorical and continuous variables with ease. SVM creates a hyperplane to divide several classes in multidimensional space, and iteratively constructs an ideal hyperplane to reduce error. The primary objective of SVM is to identify the maximum marginal hyperplane (MMH) that best classifies the dataset [10]. Here’s how it works:

First, SVM identifies the hyperplane that best separates the two classes with the largest margin. The margin is the distance between the hyperplane and the closest data points from each class. SVM then maximizes this margin while minimizing the classification error. In cases where the data cannot be linearly separated, SVM applies a technique called kernel trick to transform the input space to a higher-dimensional space where the classes can be separated by a hyperplane. This allows SVM to classify non-linearly separable datasets as well.

Src: https://pub.towardsai.net/fully-explained-svm-classification-with-python-eda124997bcd

SVM can be used for both linear and nonlinear classification tasks. In linear SVM, the goal is to find a linear hyperplane that separates the classes with the maximum margin. This means that the hyperplane should be as far away from the nearest data points of both classes as possible. The decision boundary of a linear SVM is a straight line or a hyperplane in higher dimensions.

On the other hand, polynomial SVM uses a kernel function to transform the input data into a higher-dimensional feature space, where a linear decision boundary can be used to separate the classes. The polynomial kernel function takes the dot product of two vectors and raises it to a certain power, which determines the degree of the polynomial. The higher the degree of the polynomial, the more complex the decision boundary can be.

In general, polynomial SVM is used when the classes cannot be separated linearly in the original feature space. However, it is important to choose the appropriate degree of the polynomial, as a too high degree may result in over-fitting and a too low degree may result in under-fitting.

Now,

What is k-Nearest Neighbors (kNN) classifier ?

The K-nearest neighbors (KNN) approach determines the likelihood of a data point belonging to a particular group by considering the closest neighboring data points.

One example of a supervised machine learning method employed to address classification and regression problems is the k-nearest neighbor (KNN) algorithm. Although its main application is in classification problems.

KNN is a slow learning, non-parametric approach that is commonly used for solving classification problems, although it can also be used for regression tasks.

https://www.youtube.com/watch?v=RwmttGrJs08

Suppose we have a dataset of fruits, where each fruit is described by its weight and color, and labeled as either “apple” or “orange”. We want to train a model to predict the label of a new fruit based on its weight and color.

We can use the KNN algorithm to solve this problem as follows:

Choose a value for K (the number of nearest neighbors to consider).
For a new fruit with weight w and color c, calculate its distance to each fruit in the dataset using a distance metric (such as Euclidean distance).
Select the K fruits in the dataset that are closest to the new fruit based on the calculated distances.
Determine the majority label of the K selected fruits (i.e., whether they are mostly apples or oranges).
Assign the majority label as the predicted label for the new fruit.

So,

What does Decision Trees do ?

The supervised learning algorithm family includes the decision tree algorithm, which can be used to solve classification and regression problems. With decision trees, the goal is to create a training model that can predict the target variable’s class or value by learning simple decision rules from historical data.

To predict a record’s class label using a decision tree, we start at the root of the tree and compare the record’s attribute values to those of the root attribute. Based on this comparison, we follow the branch that corresponds to that value to move to the next node. This process continues until we reach a leaf node that corresponds to a class label or regression value.

Here are the basic principles of the decision tree algorithm:

Start with the root node, which includes the entire dataset.
Find the best feature to split the dataset based on a certain criterion, such as information gain, Gini impurity, or chi-squared.
Create a new node for each possible outcome of the selected feature.
Repeat steps 2 and 3 for each new node until a stopping criterion is met, such as reaching a certain depth, achieving a certain accuracy, or having a minimum number of samples at each leaf node.

Src: https://www.researchgate.net/figure/Example-decision-tree-using-features-from-the-Linguistic-Inquiry-and-Word-Count-LIWC_fig3_325726942

Here’s an example of how the decision tree algorithm works in practice:

Suppose we have a dataset of patients and their corresponding diagnosis (cancer or not cancer) based on various features such as age, gender, smoking, and alcohol consumption. We want to build a model that can predict whether a new patient has cancer or not based on their features.

The decision tree algorithm would start by finding the feature that provides the most information gain, such as age. It would then create two new nodes for each possible age range, such as younger than 50 and older than 50. It would repeat the process for each new node until it reaches a leaf node that corresponds to a diagnosis.

The resulting decision tree could be used to predict a new patient’s diagnosis based on their age, as well as other features such as smoking and alcohol consumption.

And finally,

What is Random Forest Classification ?

Random forest is a machine learning algorithm that is a type of ensemble learning, which combines multiple decision trees to improve the accuracy of the model. In random forest classification, each decision tree in the ensemble is built on a randomly selected subset of the training data and a randomly selected subset of the features, which helps to reduce over-fitting and improve the generalization of the model.

Src: https://www.freecodecamp.org/news/how-to-use-the-tree-based-algorithm-for-machine-learning/

Here are the basic principles of the random forest algorithm:

Randomly select a subset of the training data with replacement (bootstrap sample).
Randomly select a subset of the features for each split in the decision tree.
Build a decision tree on the selected data and features, using a splitting criterion such as information gain or Gini impurity.
Repeat steps 1–3 to build a forest of decision trees.
To classify a new sample, pass it through each decision tree in the forest, and use the majority vote of the predictions as the final output.

Here’s an example of how the random forest algorithm works in practice:

The random forest algorithm would start by randomly selecting a subset of the training data and features for each decision tree in the ensemble. It would then build multiple decision trees using a splitting criterion such as information gain or Gini impurity.

To classify a new patient, the random forest algorithm would pass the patient’s features through each decision tree in the forest and use the majority vote of the predictions as the final output. For example, if five decision trees in the forest predict that the patient has cancer, and three predict that the patient does not have cancer, the final output of the random forest algorithm would be that the patient has cancer.

About the Dataset and Model :

The dataset used here is the mango leaf disease dataset uploaded on Kaggle [5], which comprises of 4000 images of all kind of leaves. Of these, around 1800 are of distinct leaves, and the rest are prepared by adjusting and focusing as needed. Diseases considered are seven diseases, namely Anthracnose, Bacterial Canker, Cutting Weevil, Die Back, Gall Midge, Powdery Mildew, and Sooty Mould. Each of eight categories contains about 500 images.

ML models are used for distinguishing healthy and diseases leaves (two-class prediction) as well as for differentiating among various diseases (multi-class prediction). In this blog, I’ll show how I performed multi-class prediction on the dataset.

Moving forward,

Loading the dataset :

Mounted Drive and loaded the dataset from there using python ‘os’ library.

Imported all TensorFlow libraries

TensorFlow :

Using TensorFlow, it is possible to extract image data from various files, resize images, and convert multiple pictures at once.

Do validation split on dataset to segregate as train data (80%) and test data (20%) and define batch size, image height and width.

Getting together at one place, but before moving ahead I would like to discuss a little about —

My Contribution :

I would like to share my input for an image classification system designed specifically for a dataset of leaf images.

I conducted an experiment following the guidelines in tutorial [3], which involved exploring various hyper-parameters such as convolutional pooling pairs. This led to the creation of CNN model 2. Although I studied models like SqueezNet and ResNet from [2], I did not have the opportunity to experiment with them.
During my research, I thoroughly investigated the issue of over-fitting in CNN models and the various methods for avoiding it. While I did experiment with different hyper-parameters, this blog post does not include information about other models that I studied.
Throughout my project, I faced several challenges related to constructing both the CNN and SVM models. However, I was able to overcome these obstacles and create a hyper-parameter-tuned version of the CNN sequential model.

Also, What is Over-fitting ?

Over fitting occurs when a model is trained on a dataset so well that it starts to capture the noise and details specific to the training data, which can lead to a decline in the model’s performance when presented with new data. In other words, the model starts to learn not only the underlying patterns in the data but also the noise and fluctuations specific to the training dataset. As a result, the model may become too complex and fail to generalize well to new data.[11]

One such scenario can be: where a machine learning model is trained to identify the difference between cats and dogs using a dataset of images. If the model is trained on a small dataset and is too complex, it may memorize the images in the training dataset, including noise and irrelevant features specific to that dataset. As a result, the model may perform very well on the training data but may fail to generalize to new images of cats and dogs. This is because the model has not learned the underlying features that distinguish cats from dogs, but instead has learned the specific features present only in the training dataset.[11]

Moving ahead, would like to show the models, used for analysis

1. CNN_Model_1 Sequential :

This model is the basic keras model built using TensorFlow and Sequential model.

CNN_Model_1.summary()

Epochs used are 5

And max accuracy found was 91% on training dataset and 87% on validation dataset.

Shows the accuracy and loss of CNN_Model_1.

Some terminologies about CNN:

In accordance with the guidelines outlined in [3], we can define the CNN model using the following terminology:

8C5: A convolutional layer consisting of 8 trainable filters with a kernel size of 5x5.
P2: A pooling operation with a window size of 2x2 and a stride of 2.
128: A fully connected layer (also referred to as a dense layer) containing 128 neurons.
D15–30%: A dropout layer that randomly drops 30% of the input units with a rate of 0.15 during training.

The following terms and definitions were learned from tutorial [2]:

Filters (also known as kernels or cores) are trainable parameters.
Weights are the values of filters that the network learns during training.
Strides are the steps by which the filter window size moves through the input.
Padding is a 0-valued frame used to process the edges of the input.
Dropout is a regularization technique that helps to prevent over-fitting.
Kernel_size=5 sets the filter size to be 5x5.
Strides=1 is the default value.
Padding=’valid’ is the default value, which reduces the output size by kernel_size-1.
Padding=’same’ means that the output will be of the same spatial size as the input.
Activation=’relu’ sets the ReLU (Rectified Linear Unit) function as the activation function.

Moving towards the next section, but before using the dataset on other models let’s sort it out accordingly,

Create a train_test split on it:

Import all the required models and get training dataset modified by reshaping it from (1,200,200,32) to (1, 120000)..

2. Naive Bayes Classifier :

After training the model on Naive Bayes Classification, the best accuracy found was 55 % on validation dataset.

Tried to predict the Testing dataset using the NBC model and predicted it right.

3. kNN Classifier :

After training the model on K-Nearest Neighbor Classification, the best accuracy found was 64% on validation dataset.

Tried to predict the Testing dataset using the kNN model and indeed predicted it right.

4. Decision-Tree Classifier :

After training the model on Decision Tree Classification, the best accuracy found was 67% on validation dataset.

Indeed correct prediction using the Decision Tree classification.

5. Support Vector Machine Classification :

We will tune the hyper-parameters so that linear SVM is built from the CNN Sequential model.

After compiling and training the model with 10 epochs -

The best training accuracy found was 12% and 11% on validation.

6. Random-Forest Classifier :

After training the model on Decision Tree Classification, the best accuracy found was 88% on validation dataset.

Perfect prediction !!!

Comparison :

As per our observation the best models to be considered for the image classification are CNN Sequential model with accuracy of 91% and Random Forest Classifier with accuracy of 88%.

Finally, we can visualize model working using a web api over a webpage, where we can dynamically assign options of model choices and give image as an input to get output in terms of categorical basis with its confidences too.

In the following images I’m gonna try same image of leaf with different models and you can see difference in it’s confidence’s.

For the demo I’m using image from “Cutting Weevil” category.

Correctly predicted by CNN with confidence of 82%.

Correctly predicted by NBC with confidence of 55%.

Correctly predicted by kNN with confidence of 69%.

Correctly predicted by NBC with confidence of 27%.

Correctly predicted by NBC with confidence of 82% similar to that of CNN.

Challenges Faced and their Solutions :

I encountered several challenges while building this image classifier app. Here’s a brief list of them and how I resolved them:

The primary task was to get familiar with the working of different ML models such SVM and Random Forest to trust them enough to evaluate the data.
The dataset given in the Kaggle was not cleaned, had some redundant data (images) 0kb/files, which were at first affected in the analysis part. Hence, I deleted those at the first pre-processing step by creating entire new dataset with cleaned images.
As mentioned earlier, the dataset has 8 sections i.e. subfolders/disease categories, hence it contains 500 images per category which are gathered by mobile phone camera and few are created by zooming and other stuff which creates changes in its sizes, for which I had to scale them accordingly and make them in one single size of 200*200 .
Choosing right ML model is as important as pre-processing of data. Wrong model can create tons of garbage values with very less accuracy of e^-286 level. Hence, I worked with the well-versed and widely used ML models for my classification.
Finally, I was even concerned as to which API to use to create an application for the demo of the project. Possible options were Flask, Gradio, StreamLit, etc. Out of which I went on with Gradio, as it is very well user friendly and simple to implement.

References :

[1] Image Recognition with Machine Learning, Educative, Image Processing — Introduction, https://www.educative.io/courses/image-recognition-ml/3w48QA4Q8Or

[2] Tutorial, Image Classification, Tensorflow: https://www.tensorflow.org/tutorials/images/classification

[3] Tutorial, Image Recognition with Machine Learning, Educative — Level Up Your Coding Skills, Link: https://www.educative.io/courses/image-recognition-ml.

[4] Medium blog, How to build image your own neural network from scratch in python, https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6

[5] Kaggle dataset — https://www.kaggle.com/datasets/aryashah2k/mango-leaf-disease-dataset

[6] Wikipedia, CNN — https://en.wikipedia.org/wiki/Convolutional_neural_network

[7] Comidor, Image Recognition Use cases, link: https://www.comidor.com/knowledge-base/machine-learning/image-recognition-use-cases/

[8] Machine Learning, https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/

[9] Blog on CNN — https://www.clarifai.com/blog/what-is-convolutional-networking?hs_amp=true&utm_term=&utm_campaign=DSA-Community&utm_source=adwords&utm_medium=ppc&hsa_acc=4305946045&hsa_cam=18142553015&hsa_grp=141361868638&hsa_ad=618056207992&hsa_src=g&hsa_tgt=dsa-19959388920&hsa_kw=&hsa_mt=&hsa_net=adwords&hsa_ver=3&gclid=CjwKCAjwzNOaBhAcEiwAD7Tb6Bd_39yY3s-xpiUj0Nx3y3GfmfnJGGP03tZWNNaBpQ-7AAZkujrVDBoCCAIQAvD_BwE

[10] Analytics Vidhya, Build an Image Classifier With SVM!, link: https://www.analyticsvidhya.com/blog/2021/06/build-an-image-classifier-with-svm/#:~:text=SVM%20is%20a%20very%20good,both%20classification%20and%20regression%20problems.

[11] Image Classification — https://medium.com/@gauravthorat1998/facial-emotion-expressions-aab729bf69c0

[12] Text Classification — https://medium.com/@gauravthorat1998/text-classification-using-naive-bayes-classifier-nbc-from-scratch-e4b6c1cb6f4c