Weed Detection and Classification Using Computer Vision & Deep Learning

Abstract

Anthony K. Mutura
Fintricity
8 min readFeb 4, 2021

--

Object recognition is a general term to describe a collection of related computer vision tasks that involve identifying objects in digital photographs. Image classification involves predicting the class of one object in an image. Object localization refers to identifying the location of one or more objects in an image and drawing a bounding box around their extent. Object detection combines these two tasks and localizes and classifies one or more objects in an image. We present a system for weed classification. Our goal is to produce a working model that can differentiate, within a reasonable degree of confidence between weed species and different growth stages of the sugar beet plant. We were able to achieve multi-class classification using deep learning methods and proposed a pipeline for this problem. Finally, a web-based application is produced which can be used to classify the sugar beet plants and detect weed.

Figure 1: A basic pipeline for weed detection and classification
  1. Introduction

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. Deep learning methods can achieve state-of-the-art results on challenging computer vision problems such as image classification and object detection. Computer Vision, often shortened to CV, is defined as a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos. It speeds up the computation process and improves the results (helps to identify the crop or any image) and reduces human effort and saves time. The deep convolutional neural network is adopted for the object classification. In neural networks, Convolutional neural networks (ConvNets or CNNs) is one of the main categories to do image recognition, image classifications. Objects detections recognize faces, etc., are some of the areas where CNNs are widely used. CNN image classifications take an input image, process it, and classify it under certain categories (Eg., Dog, Cat, Tiger, Lion). Computers see an input image as an array of pixels and it depends on the image resolution. Technically, deep learning CNN models to train and test, each input image will pass it through a series of convolution layers with filters (Kernels), Pooling, fully connected layers (FC), and apply Softmax function to classify an object with probabilistic values between 0 and 1.

Figure 2: Neural network with many convolutional layers

We proposed a web-based application for sugar beet weed classification and detection using deep learning and computer vision. Our model takes an image of the plant as an input, extracts the features, and classifies the plant. The scope of this project is to explore the feasibility of detection of sugarbeet weeds at an early stage based on close-up images of single plants grown under greenhouse conditions.

2. Weed Classification

The proposed pipeline for this model is shown in figure 3. The pipeline includes Preparation of data followed by data augmentation, training and finally evaluation of data.

Figure 3: Pipeline of weed classification

2.1 Data preparation

This is the first step of the project. Once the data is retrieved from the client, we start with this step also known as stage 1 of development. In this stage, we process the raw data so that it can run through our deep learning algorithms to make predictions. Initially, we were given a total of 270 images, this data set was provided by the client, after receiving the data we divided the raw data into training, testing, and validation set as shown in figure 4.

A total of 18 images were taken for the testing process, 252 images for training which is further divided into train and validation having 202 and 50 images respectively. The training images were divided into 3 classes (3 different plants) with each class having 3 subclasses. The subclasses represent the growth stages of each plant, having 28 images in each subclass. After this stage, we proceed to the next phase of development which is data augmentation.

2.2 Data Augmentation

This is the second stage of development that is done after the preparation of data is complete. This step was important since the data set was small, Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks. However, most approaches used in training neural networks only use basic types of augmentation. While neural network architectures have been investigated in-depth, less focus has been put into discovering strong types of data augmentation and data augmentation policies that capture data invariances. We used the following data augmentation operations: Rotation, Shearing, Zooming, Cropping, Flipping, Changing the brightness level. The total number of images after data augmentation is around 700.

Figure 5: Data augmentation applied on the data set

2.3 Training

This is the third phase once data preparation and data augmentation are done we choose the model and do the training.

2.3.1 Models

For image classification, we use CNN (convolutional neural network) architecture for deep learning models. We used 2 Convnets models that have been tested with the current dataset: Resnet18 and MobilenetV2. Moreover, there are also many other models that would work as well such as VGG, Alexnet, InceptionV3, etc.

a. Resnet18:

The residual neural network is also known as Resnet is a convolutional neural network that is 18 layers deep. You can load a pre-trained version of the network trained on more than a million images from the ImageNet database. The pre-trained network can classify images into 1000 object categories, such as a keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 224-by-224. For more pre-trained networks in MATLAB. The architecture of resnet is shown in figure 6.

Figure 6: building block diagram of resnet18

b. MobileNetV2:

MobileNetV2 to power the next generation of mobile vision applications. MobileNetV2 is a significant improvement over MobileNetV1 and pushes the state of the art for mobile visual recognition including classification, object detection, and semantic segmentation. MobileNetV2 is released as part of the TensorFlow-Slim Image Classification Library. MobileNetV2 builds upon the ideas from MobileNetV1, using depthwise separable convolution as efficient building blocks.

However, V2 introduces two new features to the architecture:

1) linear bottlenecks between the layers,

2) shortcut connections between the bottlenecks.

The basic structure is shown below.

Figure 7: basic building blocks of MobileNetV2

2.3.2 Optimizer

Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate in order to reduce the losses. Optimization algorithms or strategies are responsible for reducing the losses and providing the most accurate results possible. We used Adam optimizer.

Adam (Adaptive Moment Estimation) works with momentums of first and second order. The intuition behind the Adam is that we don’t want to roll so fast just because we can jump over the minimum, we want to decrease the velocity a little bit for a careful search. In addition to storing an exponentially decaying average of past squared gradients like AdaDelta, Adam also keeps an exponentially decaying average of past gradients M(t). Adam is a replacement optimization algorithm for stochastic gradient descent for training deep learning models. Adam combines the best properties of the AdaGrad and RMSProp algorithms to provide an optimization algorithm that can handle sparse gradients on noisy problems. Adam is relatively easy to configure where the default configuration parameters do well on most problems.

2.3.3 Activation Function

Activation functions are mathematical equations that determine the output of a neural network. The function is attached to each neuron in the network, and determines whether it should be activated (“fired”) or not, based on whether each neuron’s input is relevant for the model’s prediction. Activation functions also help normalize the output of each neuron to a range between 1 and 0 or between -1 and 1. An additional aspect of activation functions is that they must be computationally efficient because they are calculated across thousands or even millions of neurons for each data sample. Modern neural networks use a technique called backpropagation to train the model, which places an increased computational strain on the activation function, and its derivative function.

We used Relu as an activation function.

The Rectified Linear Unit is the most commonly used activation function in deep learning models. The function returns 0 if it receives any negative input, but for any positive value x, it returns that value back. So it can be written as f(x)=max(0,x).

Figure 8: Relu graph

2.3.4 Weights and Bias

Weights and biases (commonly referred to as w and b) are the learnable parameters of a machine learning model. Neurons are the basic units of a neural network. In an ANN, each neuron is in a layer and is connected to each neuron in the next layer. When the inputs are transmitted between neurons, the weights are applied to the inputs along with the bias.

Figure 9: equation of the neuron

Weights control the signal (or the strength of the connection) between two neurons. In other words, a weight decides how much influence the input will have on the output.

Biases, which are constant, are an additional input into the next layer that will always have the value of 1. Bias units are not influenced by the previous layer (they do not have any incoming connections) but they do have outgoing connections with their own weights. The bias unit guarantees that even when all the inputs are zeros there will still be activation in the neuron.

Written and Authored by Tarun Singh, Data Scientist with a keen interest in Computer Vision and Machine Learning.

Get in touch with us by following us on our LinkedIn, Facebook, and Twitter for more details.

--

--