Creating an AI app that detects diseases in plants using Facebook’s deep learning platform: PyTorch

According to the Food and Agriculture organization of the United Nations (UN), transboundary plant pests and diseases affect food crops, causing significant losses to farmers and threatening food security.

The spread of transboundary plant pests and diseases has increased dramatically in recent years. Globalization, trade and climate change, as well as reduced resilience in production systems due to decades of agricultural intensification, have all played a part.

Transboundary plant pests and diseases can easily spread to several countries and reach epidemic proportions. Outbreaks and upsurges can cause huge losses to crops and pastures, threatening the livelihoods of vulnerable farmers and the food and nutrition security of millions at a time.

If you are into data science or machine learning, you’ve probably heard about these platforms crowdsourcing data challenges. The first that comes to my mind is Kaggle. Kaggle is this crowd-sourced platform that attracts, nurtures, trains and challenges data scientists from all around the world to solve data science, machine learning, and predictive analytics problems. This platform enables data scientists and other developers to engage in running machine learning contests, write and share code, and to host datasets.

Looking for project ideas and datasets, found out another platform similar to Kaggle, but as a non-profit, I’m talking about crowdAI. crowdAI also hosts open data science challenges and helps universities, government agencies, NGOs, or businesses to run and manage their data challenges. The crowdAI platform is an open source infrastructure that can immediately reach thousands of data scientists around the world to work on interesting data problems.

I wanted to mention crowdAI, because it was there where I found the “PlantVillage Disease Classification Challenge”. The goal of this competition was to develop algorithms than can accurately diagnose a disease based on an image.

This challenge has already ended but I wanted to approach the same goal, using a different Deep Learning framework: PyTorch. So, I developed an AI application using a deep learning model and the transfer learning technique.

For this challenge, I used the “PlanVillage dataset”. This dataset contains an open access repository of images on plant health to enable the development of mobile disease diagnostics. The dataset contains 54, 309 images. The images span 14 crop species: Apple, Blueberry, Cherry, Grape, Orange, Peach, Bell Pepper, Potato, Raspberry, Soybean, Squash, Strawberry, and Tomato. It contains images of 17 fundal diseases, 4 bacterial diseases, 2 mold (oomycete) diseases, 2 viral diseases, and 1 disease caused by a mite. 12 crop species also have images of healthy leaves that are not visibly affected by a disease.

The dataset contains 38 classes of crop disease pairs and are listed below:

1) Apple Scab, Venturia inaequalis
2) Apple Black Rot, Botryosphaeria obtusa
3) Apple Cedar Rust, Gymnosporangium juniperi-virginianae
4) Apple healthy 5) Blueberry healthy 6) Cherry healthy 7) Cherry Powdery Mildew,Po-dosphaera spp.
8) Corn Gray Leaf Spot,Cercospora zeae-maydis
9) Corn Common Rust, Puccinia sorghi
10) Corn healthy
11) Corn Northern Leaf Blight,Exserohilumturcicum
12) Grape Black Rot,Guignardia bidwellii
13) Grape Black Measles (Esca),Phaeomoniella aleophilum, Phaeomoniella chlamydospora
14) Grape Healthy
15) Grape Leaf Blight,Pseudocercospora vitis
16) Orange Huanglongbing (Citrus Green-ing), Candidatus Liberibacter spp.
17) Peach Bacterial Spot, Xanthomonas campestris
18) Peach healthy 19) Bell Pepper Bacterial Spot
Xanthomonas campestris
20) Bell Pepper healthy
21) Potato Early Blight, Alternaria solani
22) Potato healthy
23) Potato Late Blight, Phytophthora infestans
24) Raspberry healthy
25) Soybean healthy
26) Squash Powdery Mildew, Erysiphe cichoracearum, Sphaerotheca fuliginea
27) Strawberry Healthy
28) Strawberry Leaf Scorch, Diplocarpon earlianum
29) Tomato Bacterial Spot, Xanthomonas campestris pv. vesicatoria
30) Tomato Early Blight, Alternaria solani
31) Tomato Late Blight, Phytophthora infestans
32) Tomato Leaf Mold, Fulvia fulva
33) Tomato Septoria Leaf Spot, Septoria lycopersici
34) Tomato Two Spotted Spider Mite, Tetranychus urticae
35) Tomato Target Spot, Corynespora cassiicola
36) Tomato Mosaic Virus
37) Tomato Yellow Leaf Curl Virus
38) Tomato health

Defining transforms for the train and test dataset

The purpose of using data augmentation to our train and test dataset is to increase the number of images our model can see by applying random transformations to the images. In my case, I applied some data augmentation such random rotation, resized crop, random horizontal flip and center crop. Remember, we want our model to classifies the images regardless of orientation.

Choosing the neural network architecture

I decided to use one of the pretrained models from torchvision.models to get the image features and build and train a new feed-forward classifier using those features.

The pretrained model I chose was the Microsoft’s Residual Networks architecture: Resnet-152. This is one of those models used in COCO 2015 competitions, which won the 1st place in: ImageNet classification, ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Architecture of a Resnet-152

The reason I used a pretrained model is because it’s a time saver process and this kind of model was trained on a large dataset to solve a problem similar to the one I wanted to solve.

Removing the original classifier

After installing the pre-trained model for the image classification, I removed the original classifier, then added a new one, to help me in identifying plant diseasess and finally, fine-tuned the model by freezing some parameters.


I ran the model with 10 epochs and used Adam optimizer with a learning rate of 0.001:

Ran the test function and got an accuracy of 0.961

To make sure the model was working good, I decided to test the model again applying a sanity checking:

It seems to work pretty good! I just need to deploy the model to get an application ready to use on the smarthphone!

You can check out my full code on GitHub:

FAO article about Plant pests and diseases:

You can find the paper of the PlantVillage dataset here:

Data Driven Investor

from confusion to clarity, not insanity

Viridiana Romero Martinez

Written by

Artificial Intelligence and Machine Learning enthusiast. Data Driven Investor writer. Healthy lifestyle lover ♥

Data Driven Investor

from confusion to clarity, not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade