Fruits and Vegetables Classification with Fruits 360 dataset using Deep Learning

8 min readJan 5, 2021

How many different types of fruits and vegetables are present in this world? Among them some are known and some are unknown to us. Some time it is very difficult for human to segregate different types of fruits and vegetables. Computer can do this task very easily with the help of Machine Learning or Deep Learning.

What is Machine Learning?
To break this question in simple terms Machine Learning is teaching a Machine (i.e computer) how to identify different properties of an object by training them with many different data of similar type. Machines after taking that data identifies the similarities or dissimilarities in patterns of each data and thus predicts how much close we reach to a particular data.

So that was just a basic explanation of Machine Learning. In this blog we are going to train a dataset using Deep Learning and CNN (Convolutional Neural Network). Before that let us first understand what is Deep Learning.

Deep Learning is a subset of Machine Learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

Here, in this blog I am going to work on Image Classification using Fruits 360 dataset using Deep Learning Algorithms and explain how I improved my model.

Things used in this project

Python 3
Libraries like Pytorch , Matplotlib , Opendatasets in Python
Kaggle Kernels/Notebooks and dataset from Kaggle
Jovian.ml (Project tracking and collaboration platform for data science and Machine Learning) to save and log the data

If you are new in this domain, NO WORRIES, go through this blog and you will get some idea about this. You can also check my source code of this project linked at the end of the blog.

Let’s Start. 😀

Preparing the data

Total number of images: 90483.

Training set size: 67692 images (one fruit or vegetable per image).

Test set size: 22688 images (one fruit or vegetable per image).

Multi-fruits set size: 103 images (more than one fruit (or fruit class) per image)

Number of classes: 131 (fruits and vegetables).

Image size: 100x100 pixels.

Here is the link of the dataset.

Explore dataset

Here we can see that the dataset is already loaded in my kernel and it contains TEST and TRAINNING folder. Each folder has 131 more folder within them. Each folder contains 100 x 100 images.

We have divided the test dataset into two parts, one is validation dataset and another is test dataset.

Training the dataset

For a better training we need more data but as we know we have very limited amount of data. So we have to randomize the dataset by applying different properties like Data Augmentation.

Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks.

We also do Data Normalization. In this technique we normalized the image tensors by subtracting the mean and dividing by the standard deviation of pixels across each channel. Normalizing the data prevents the pixel values from any one channel from disproportionately affecting the losses and gradients.

Let’s look at the function where we define and prepare training and validation dataset.

We then prepare a batch of 200 images (can be set according to the availability of the GPU) and then print a batch. You can change it according to your wish. If you are getting out of memory error then reduce the batch size.

First 64 images of a batch

In this batch of image we can see that some fruits and vegetables are cropped from upper side and lower side, this thing has happened because of random crop. Also data augmentation has been applied which are generally harder to observe but if one observes closely one can see some part of some images are reflecting due to the padding_mode = “reflect”.

Using a GPU

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device & to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required.

Why do we need a GPU?

Training on CPU could be tiresome and time consuming. It will be a slow process and also the the performance of the model might crash. This is why we need a GPU for better performance and speed.

Transfer the data to the device(In this case device is GPU)

Preparing the Model

Now we will define some helper functions that is common in many deep learning models like detecting accuracy and losses.

I am using cross_entropy loss function to detect the loss. Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label. To know more about different loss functions click here.

Lets define our model.

Let us go through the functions part by part.

The function conv_block takes in some inputs in_channels, out_channels and boolean value pool. The work of the function is to convert the number of channels from in_channels to out_channels using Conv2d function and a kernel of size 3 x 3 kernel. We also apply batch Normalization and
Relu activation function after Conv2d function. If pool value is True a Max_pool operation occurs which takes the maximum value from a 2 x 2 box and returns it in a single box thus decreasing the total size of tensor. One call to the Conv_block can be referred to as a single layer of the model.

The next function Resnet9 (I have just given this name. I have build this model from scratch) is the main function to define different layers of the models. I have played with many different approaches in this function and displayed the best one that worked for me (I will discuss about all other possibilities I tried and try to compare which one works best for this particular dataset). This model has 9 layers with 5 convolutional layers 4 residual layers and the last layer.

Finally the forward layer takes a batch of data from the dataset and pass it through all the layers of the model, thus training the model bit by bit.

Let us have a look at the structure of the model I defined.

Training the Model

We finally reach the most important part of image classification i.e Training the Model. We use different hyper-parameters like learning rate, weight decay, gradient clipping, number of epochs (One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE) and optimization function.

Note the learning rate for our model is not fixed but rather changes with each epochs increasing in the beginning to reach a peak value of the learning rate at nearly 30% of the total number of epochs and then gradually decreases. This method is called ONE CYCLE METHOD. Know more about Learning rate Scheduler here .

Initially the val_acc is around 0.00342 which means 0.342 % as the model is randomly defined in the beginning the val_score is so less.

At first we define the no_of_epochs, max learning rate, gradient clipping value (default = None), weight decay (default = 0), and optimization function(default = SGD)

Hyper Parameters

Training Process

I have trained the model for 7 epochs and got 0.9929 accuracy which means 99.29 % accuracy. This training has completed within almost 50 mins. This time can be reduced if you use pretrained models.

Plotting the Accuracy and Losses

Accuracy Graph

This is the graph of accuracy vs number of epochs. The accuracy is predicted on the validation set which is constant and not randomized. We can see the graph has a steep slope in the beginning and then slowly comes near to a constant value after which it stops getting better in accuracy.

Loss Graph ( Training and validation loss )

This is the graph of loss in training and validation and training dataset vs the number of epochs. We can see the loss decreases at a high rate in the beginning and gradually the rate of decrease becomes less. After reaching a certain point in training, the validation loss almost coincide together.

Learning rate graph

We will use this graph to visualize how the learning rate varies in one cycle learning rate scheduling .

We can see from this graph that the Learning rate starts from one point reaches it’s peak value after certain number of epochs 30% of the total number of epochs provided and then again decreases to a certain point. This pattern is observed in all learning rate scheduling done by one cycle method.

Prediction of image and testing of model on the test data

We will view some images from the dataset and see how our model works in image classification.

Prediction 1

Prediction 2

Prediction 3

In these pictures we see that our model performs perfectly. 😊

So, what is the final accuracy in the test dataset ?

We were able to achieve an accuracy of nearly 99 % even in the test dataset which shows us that the model we trained is not overfitted and works nicely even with images other than that of the training images.

Validation Accuracy on Test Dataset

Saving the model weights and recording the metrics

After we finally train our model we should save the model weights we have trained in a file for future use so that we don’t have to train the model every time we try to use it.

Here is how I have done it.

SAVING THE MODEL WEIGHTS

To log the values of all the hyper-parameters and metrics in jovian.ml platform we use the following functions.

Different approaches to train the model

To get the best result from our model you can try different approaches hyper parameters and model architecture in my model. The things we can change to get the best results are as followed :-

Changing the model structures use of different pre-trained model (VGG or Resnet or Normal CNN approach).
Changing the Optimization function (Adam or SGD)
Changing the learning rate
Using different Activation function (Relu or Sigmoid)
Can use pretrained weights

All Source code to my Project can be found here Fruits 360 project.

If you want to learn more about deep learning follow this course zerotogans , this course made my project possible. This course is uploaded for free in YouTube channel Freecodecamp.org (click here for the online lectures on YouTube).

Also follow Freecodecamp, here many resources are available for free.

The course is taught by Instructor Aakash N S (Jovian ID), the founder of Jovian.ml. I would like to express my gratitude to Aakash Sir and his team for this wonderful course.

Connect with me on Medium — Arnab Bhakta

Connect with me on Linkedin —linkedin@arnab

THANK YOU ….😀