Plant Images Segmentation with Deep Learning

Anis Ismail
Oct 13 · 7 min read

How we tackled plant images segmentation problem with Deep Learning.

Image for post
Image for post
Photo by Josefin on Unsplash

Identifying weed plants in cultivated field is a time-intensive task. Automating such a task can drastically reduce the time needed to identify and remove weeds and thus increase the yield and the productivity of the workers. In our latest project, my colleague Rawane Madi and I worked on a plant segmentation task that uses Deep Learning to identify weeds and their stems from crops. We will share with you in this article our journey starting from finding relevant datasets, choosing the proper neural network architecture, training the model, evaluating its performance, to finally launching it.

Domain Data Availability

Our first step was to find datasets that match the segmentation task in our hands. After finding several options online, we decided to:

  • Find a large enough labeled dataset since our task is delicate and our labeling resources are limited.

Our choice finally landed on the Sugarbeet dataset which turned out to be the most convenient for our task. The images were taken by a robot on a sugar beet farm in Germany for 3 months. Crops were photographed from their emergence date until the advanced growth stage. The dataset was labeled for crop and weeds but not for stems. We first trained our models on the Sugarbeet dataset and then on another custom dataset with images that were labeled for crop, weeds, and stems.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Example pictures of Sugarbeet dataset

Literature Review

During our literature review, we found two interesting image segmentation papers:

  • The first paper trains a neural network model for pose regression to generate a plant location likelihood map. The stems are then extracted from the heat map with high centimeter accuracy.
Image for post
Image for post
Model Architecture — Source: From Plants to Landmarks: Time-invariant Plant Localization that uses Deep Pose Regression in Agricultural Fields, Kraemer et Al .
  • The second paper uses a novel joint model architecture based on FC-DenseNet. In a segmentation task, the encoder generally produces a compressed but information-packed representation of the input while the decoder upsamples the representations to the original input size with pixel-wise predictions. But in this case, the encoder data volume is connected to two decoders. The plant decoder produces plant features, determining whether a pixel is soil, plant, dicot weeds, or grass weeds. And the stem decoder does the stem detection for the plant-weed region.
Image for post
Image for post
Joint Model Architecture — Source: Joint Stem Detection and Plant-Weed Classification
for Plant-specific Treatment in Precision Farming, Lottes et Al (2018)

Train Models

Masks Generation and Data Encoding for Models

The datasets were annotated by color. Occluded objects were segmented as well. The annotation of those images consisted of red polygons for crops, green polygons for weeds, and blue circles for stems. Generating masks consisted of generating a 2D matrix with the same image size that contains the labels of the three classes.

Image for post
Image for post
Example of Sugarbeet image mask where green polygon are weeds and red polygons are crops

To generate a mask, the following steps were applied:

  • Load the corresponding image
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Examples from the Sugarbeet dataset with their corresponding generated masks

In the case of the joint model, labels were split between two masks. The first type of mask, corresponding to the first branch, would contain labels 0, 1, 2 with the same significance as above. The second type of masks, corresponding to the second branch, would contain labels 0,1,2 where 2 has the same purpose of label 3 above.

Training Approach

We have followed the same approach for both model architectures:

  • Inputting a 3D RGB image of size 128x128

Model Choice

Further research led us to a U-Net model trained on the Sugarbeet dataset (link). Even though the model is only trained for weed detection, we used the weights to transfer the learning to the entire task.

Image for post
Image for post
Illustration of U-Net architecture from U-Net paper — Source: deeplearning.net

We found other weights for the ResNet 101 model. We believed that ResNet might be a good candidate because it uses residual blocks known to preserve spatial information during the encoding-decoding process. This model was originally trained on ImageNet dataset, so we replaced the last layers with convolutional layers and transferred the learning to our task on the custom dataset.

Model with Stems as additional channel

We decided to go with the most straightforward approach by adding an additional channel for the stems. We will have 4 channels in the output image: Background, Crop, Weed and Stem. Code for U-Net and ResNet implementations are found in our Github repository.

Joint model

This model implementation is inspired by the previously mentioned paper by Lottes et Al . The model consists of two outputs and one input (check previous diagram). The first output is the crop/weed mask and the second output is the stem of the weed plant. Therefore, two types of masks need to be generated to cater for each output branch.

The joint model approach separates the identification of the crops/weeds from stem detection while still leveraging the learning from the first task to inform the second task. Hence, instead of sequencing the two tasks, it combines them into one by taking advantage of one encoder. Code for U-Net, ResNet, and FC-Densenet joint models are also available on our Github repository.

Performance Metrics

We relied on the following metrics to test the performance of our models:

  • Mean IoU or Intersection over Union, is a common measure for image segmentation. It compute the Intersection over Union for each separate class and then it averaged over the number of classes. The higher the value of the mean IoU, the better.
Image for post
Image for post
IoU formula — Source: Towards Data Science
  • We define stem accuracy as the ratio of number of times a correct stem is predicted by the model over the actual number of existing stems.

Start Training

We trained U-Net model for 360 epochs and ResNet model for 200 epochs with mean IoU and accuracy as training metrics. For roughly the same number of epochs, U-Net outperformed ResNet in IoU and stem accuracy. The final weights of the models can be found here.

Results on Custom Dataset

After training the U-Net model for 360 epochs on the custom dataset, the average mean intersection over union (mean IoU) evaluated on 90% of the custom dataset was 0.6 . The stem accuracy evaluated on 90% of the custom dataset was 0.8 .

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Examples of what U-Net predictions would look like on Sugarbeet dataset

Further Directions

In order to improve our model performance even further, here are some directions we want to further investigate:

  • Acquire more labeled images

NB: The work described in this blog was developed by Zaka as a project for a client. Zaka does not own the Intellectual Property for the ideas described in this blog.

Don’t forget to support with a clap!

Do you have a cool project that you need to implement? Reach out and let us know.

To discover Zaka, visit www.zaka.ai

Subscribe to our newsletter and follow us on our social media accounts to stay up to date with our news and activities:

LinkedInInstagramFacebookTwitterMedium

Zaka

An intelligent transformation

Anis Ismail

Written by

Undergraduate Researcher | Machine Learning Engineer | Computer Engineering Student @ LAU | anisdismail.com

Zaka

Zaka

Zaka is on a journey to democratize Artificial Intelligence (AI) through sharing knowledge, building solutions, and connecting people!

Anis Ismail

Written by

Undergraduate Researcher | Machine Learning Engineer | Computer Engineering Student @ LAU | anisdismail.com

Zaka

Zaka

Zaka is on a journey to democratize Artificial Intelligence (AI) through sharing knowledge, building solutions, and connecting people!

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store