Building a CNN to Detect Late Blight in Tomato Crops
Preventing the disease in our favourite juicy red fruits to reduce harvest and economic loss
Personally, I think the tomato is a top-tier vegetable. And stats can back me up on this statement — tomatoes comprise of 19% of all vegetable consumption in the US, second only to potatoes at 23%.
And yes, I classified the tomato as a vegetable. Botanically, tomatoes are fruits, but according to the Encyclopedia Britannica, nutritionists also classify them as vegetables for the way they are utilized.
Although they are a huge leader in vegetable consumption, one of the biggest restrictions to tomato harvest are the specific soil conditions required to grow them, and their susceptibility to a variety of diseases.
For example, there is an annual economic loss of 20%–70% in India as a result of the Blight disease, which is caused by a fungal pathogen: Phytophthora infestans.
If she doesn’t sound familiar to you, Phytophthora was also the cause of the devastating Irish potato famine in the 1840s.
She tends to show up where she isn’t wanted, in the form of irregularly shaped water-soaked lesions (An area of abnormal tissue) on tomato plants’ leaves. As Blight progresses, these lesions enlarge causing leaves to brown, shrivel and die.
Because of the significant economic loss that this disease contributes to, I decided to build a Deep Learning Model called a CNN (Convolutional Neural Network) to classify images of healthy and late-blight affected plants.
Project Intention:
If we could utilize Computer Vision infrastructure like field imaging, alongside a neural network like a CNN that would process, and classify this data into healthy vs diseased plants, many of the negative impacts of disease would be mitigated, including economic and produce loss. 🌱
Although incentivization is a big problem with projects like these, it was more of a training exercise for me to understand the CNN framework by applying it to food sustainability.
If you’d like to follow along with my code here’s the Github repository! https://github.com/ashnanirula/tomato-disease-project
Let’s jump in.
CNN’s are sounding pretty convoluted…
I used to agree, but to be honest they are one of the coolest types of networks you’ll ever meet.
A typical CNN looks something like this:
It looks incredibly complicated, but let’s break this down into small bite sized chunks.
First chunk: What’s a Convolutional Layer?
Before we get into the layer — let’s start at the input image.
Every image contains a series of pixels, and each pixel on this image has a different pixel value from 0–1.
Our convoluted layer takes this image and then utilizes a special matrix called a filter (or kernel) that “scans” over these pixel values to look for specific features. This process is called feature mapping.
In our case, features would be characteristics that we can classify the tomato leaves by, so lesions, different curves, brown colour changes…etc.
The special “scanning” filter is applied to each part of the image, and a product is calculated by multiplying the input pixel * filter value.
For the first pixel in this visual example, it would be 9 * 0 which is the first operation in the length of calculations:
After each convolution operation, our CNN also applies a special function called ReLU (Rectified Linear Unit).
What’s ReLU?
Other than having a dope name, ReLU is an activation function that turns any negative number values to 0 and keeps any positive values.
So if we had a -10920 value, with ReLU it would = 0, and if we had a 10920 value it would be = 10920
In this visual, the first (orange) array is the convoluted output array (feature map) and the green one is the newly produced ReLu Feature Map.
Ta-da!
Chunk 2: Pooling
Once we produce our beautiful convoluted array after using ReLU, a process called max pooling is the applied to downsize and reduce the amount of paramaters being measured.
To do this, we use another filter (without weights) that “scans “each section of our new array. Whichever value is the largest from each scanned section is recorded on a newly pooled layer.
We repeat this process of convoluting our array with a weighted kernel, applying ReLU, then downsizing it using Maxpool() with another filter, multiple times to make our model more accurate and precise.
The Final Chunk: Classification!
Okay great. We now have a set of precise feature maps.
But it is only through a fully connected layer that our model is able to classify whether a brown lesion feature = disease, and a green leaf feature = healthy for example?
The fully connected layer takes the feature maps we created, flattens them into a vector and through an intricate network of neurons, classifies each feature into the final output layer of the model — which is however many categories there are.
In this example, the array’s rows were flattened into a vector, and then the input was classified into either a1 or a2.
And there we go — a high level overview of how a CNN classifies an input image (like this zebra!) into whichever category it was being measured by.
Hopefully the diagram is looking a little less convoluted to you.
Now — let’s jump into my code.
First step: importing the various libraries that allowed me to build this model quickly and easily.
import os #allows us to control the directories
import numpy as np #numpy library contains a lot of functions like MaxPooling!
import matplotlib.pyplot #to plot my graphs at the end
import tensorflow #for all the awesome AI tensorflow utilities
I also imported the Tensorflow specific utilities to build out the different layers and components of the model:
- examples included Dense, Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization
- The ImageDataGenerator was also included in this section as “IDG”
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D, BatchNormalization
from keras.preprocessing.image import ImageDataGenerator as IDG
After I had all my libraries set up to make this project super easy and a lot simpler, I imported my dataset through Google Drive because of the simple integration with Google Colab.
I then created directories (groups) for both of my classes of data in training and validation.
Since I had two classes, healthy and Late blight diseased plants, I made a total of 4 directories. (2 training, 2 validation)
train_healthy_dir = os.path.join('/content/tomato_dataset/train/healthy')
train_blight_dir = os.path.join('/content/tomato_dataset/train/Late_blight')
val_healthy_dir = os.path.join('/content/tomato_dataset/val/healthy')
val_blight_dir = os.path.join('/content/tomato_dataset/val/Late_blight')
I also wanted to visualize the size of my data, so I used the len function to find the number of images in each directory.
num_tr_healthy = len(os.listdir(train_healthy_dir))
num_tr_blight = len(os.listdir(train_blight_dir))
num_val_healthy = len(os.listdir(val_healthy_dir))
num_val_blight = len(os.listdir(val_blight_dir))
print('total healthy training images: ', num_tr_healthy)
print('total diseased training images: ', num_tr_blight)
print(' ')
print('total healthy validation images: ', num_val_healthy)
print('total diseased validation images: ', num_val_blight)
I ended up having a total of around 3000 images for each of my training classes, and 800 for my validation classes.
The size of the dataset was important, because it determined the next variable which was my BATCH_SIZE. Your batch size is basically the number of training examples you use in one iteration of training. I set mine to 1000. I also setup the img_shape variable to eventually make all my images the same size (155 by 155)
BATCH_SIZE = 1000
IMG_SHAPE = 150
I then applied data augmentation to prevent overfitting, a common error that occurs in ML where the model “memorizes” the training dataset and does poorly on a validation dataset.
I applied transformations to my training dataset (like rotations, flips, zooms…) to mitigate this issue and diversify my dataset.
#my transformations to the training dataset!
image_gen_train = IDG (rescale=1./255,
rotation_range=45,
width_shift_range=.15,
height_shift_range=.15,
horizontal_flip=True,
zoom_range=0.5)
Finally, it was time for the best part — building out the actual model! Tensorflow made this a very easy and straightforward process.
I created my convolutional layers with a filter size of 3,3, a ReLU activation function and padding set to “same” that way any features on the extremities of the image were accounted for. For each of the Convolutional Layers, I also set up MaxPooling layers with a filter size of 2,2.
model = Sequential()
model.add(Conv2D(32, (3,3), padding='same', activation='relu', input_shape=(IMG_SHAPE, IMG_SHAPE, 3)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3,3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3,3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
Also added a Dropout layer after flattening my feature maps which was to randomly ignore selected neurons during training. This was also a strategy to prevent overfitting, as it prevented specific neuron patterns from occuring.
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(512,activation='relu'))
model.add(Dense(2))
I then compiled the whole model with an optimizer (a tool to adjust the weights on the filters to reduce the losses) called ‘adam’.
In addition, to compute the loss between the labels and predictions, I used Tensoflow’s SparseCategoricalCrossentropy utility.
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
It was then time to train our model, and after a beautiful 120 epochs (runs through the training) I achieved a validation accuracy of 98.5 with this very binary classification problem!
So, you built a CNN. What does this even mean?
Although this CNN to classify images of diseased and non-diseased plants was an awesome asset in gaining knowledge about Machine Learning, there are many considerations for actual use cases in a real tomato field.
For one, it would be extremely difficult to incentivize farmers to use drones and other Computer Vision infrastructure for crop field imaging because of high upfront costs, and lower breadth of input data. 💰
There are diseases that also occur in many different sections of the tomato plant, like the fruit and the stems for example, so the accuracy and area of plant covered by the datasets would not only need to be more expansive, but the models themselves would have to grow in further size.
So even if the act of implementing similar solutions in crop fields is a much more difficult endeavour, the “CNN-backed Computer Vision” concept is extremely beneficial in many sectors where previously existing infrastructure is present, like sorting recycling and waste for example.
Learning how to build this core type of neural network was a gratifying and interesting process nevertheless, and I am extremely optimistic and excited to see how the CNN framework will continue to have a tremendous impact on the way we process items, make decisions, and uncover interesting correlations in the data we see and use every single day.