Building a Deep Learning model with Pytorch to classify fruits and vegetables

Published in

The Startup

8 min readJul 1, 2020

In this post we will build and train a Convolutional Neural Network(CNN) based deep learning model with Pytorch that can classify 131 different fruits and vegetables.We will use the Fruits 360 dataset from kaggle to train and test our model.The dataset can be found here.

It is best to use a CNN architecture for our model since this is an image classification problem ie.given an image of a fruit or vegetable as input our model should be able to identify which fruit or vegetable is in the image.

Let us start our project by exploring the dataset. First let us import the required python modules.

Exploring the data

It is always a good practice to explore and understand our dataset before proceeding to building and training our model.So let us take a look at the fruits360 dataset using some tools.

As we can see the dataset has 3 sub-folders of which the Training and Test folder are of relevance to this project.The training folder contains the images for training our model which are separated into different folders based on their class ie.apple folder for images of apple,peach for images of peach etc.

Let us take a look at the frequency distribution of training images

As we can see some classes contain more images than others.Therefore after training on this data our model might be better at predicting some classes than others.

Before proceeding further we have to convert the image data into pytorch tensors so that our model can work with them.We can use the ImageFolder class from torchvision to load the data as PyTorch tensors.

Let us take a look at a sample image tensor from the dataset

The tensor has dimension 3x100x100 which means that the images are RGB with 100x100 resolution.

Now we have succesfully converted the images into tensors.Let us take a look at some of the images from the dataset using matplotlib. First we can define a helper function to help us view images from the dataset.

Now let us take a look at some of the training images

As we can see the images are pretty clear and we can easily distinguish these fruits given that we have knowledge of them.So it shouldn’t be a hard problem for out model to solve.

Splitting dataset into training and validation sets

Before moving further,let us split our dataset into training and validation dataset.Creating a proper validation set is very important to measure the performance of our model as we train it.

We use a fixed seed value to make sure we get the same validation set every time we create the model.This helps in evaluating different model architectures against the same validation set.

The dataset has a total of 67,692 images.Choosing the size of the validation set is at your own discretion.Different sizes will affect the performance of the model.Here we will choose a 5% validation size ie.we will set aside 5% of images from the training dataset for validating the model as we train it.

So our validation set contains 3384 images.

Creating dataloaders

Let us now define dataloaders for loading the training data as batches into our model.This can easily be achieved with the DataLoader class from pytorch.

I have chosen a batch size of 128 for training and 256 for validation loader.We have set shuffle=True for the training loader so that we get a variety of images for each batch.If we load without shuffling then we might be getting images of the same fruit in a batch.

Now let us take a look at a batch of images from the training loader using a helper function to make sure everything is going well.

Okay , so far so good.The training batch has a variety of images of different fruits and vegetables.

Defining the model

Now we have come to the main part.Creating our model.We will create a CNN based model to tackle this classification problem as CNNs usually work great with image datasets.First of all let us define a general base class which can be used in almost all kinds of image classification problems.

We have defined the ImageClassificationBase class by extending the nn.module class from pytorch.The training_step takes a batch and passes it through the model ,computes the loss and returns the loss.We are using the cross entropy loss function here which is pretty good for classification problems.The validation_step takes a batch of images and returns the loss and calculates the accuracy.The other two functions simply track the metrics after each epoch ends.

Now let us define our model architecture.We will use 6 convulational layers followed by 3 linear layers in our model.We can define the model as follows.

We will also use ReLU activation function after each convolutional layer and a maxpool layer after each 2 convolutional layers.

The complete architecture

Let’s verify that the model produces the expected output on a batch of training data. The 131 outputs for each image can be interpreted as probabilities for the 131 target classes (after applying softmax), and the class with the highest probability is chosen as the label predicted by the model for the input image.

As expected the output is a tensor with 131 values which will represent the probability that the images belongs to a given class out of 131.The negative values will be taken care of by the cross entropy function by applying softmax.

Using a GPU for training

We will be training our model on a GPU for faster training.Let us define a couple of functions and classes to help load our model and data on to the GPU.

We can now wrap our training and validation data loaders using DeviceDataLoader for automatically transferring batches of data to the GPU (if available), and use to_device to move our model to the GPU (if available).

Training the model

Now we are all set to start training our model.Let us create a fit function to define our training loop.A good training loop can be the difference between a model that performs well and a model that performs bad.

Our fit function will train the model for a specified number of epochs ie.pass a batch of data through the model,compute the loss and gradients using an optimizer and adjust the weights after each epoch.

Before we begin training, let’s instantiate the model once again and see how it performs on the validation set with the initial set of parameters.

The initial accuracy is around 1%, which is what one might expect from a randomly intialized model (since it has a 1 in 131 chance of getting a label right by guessing randomly).

We’ll use the following hyperparmeters (learning rate, no. of epochs, batch_size etc.) to train our model.

num_epochs = 3
opt_func = torch.optim.Adam
lr = 0.001
batch_size = 128

The optimizer we will use is the Adam optimizer .The Adam optimization algorithm is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing.Learn more about the Adam optimization algorithm here.

Okay now we are ready to begin training.Let us train our model for first 3 epochs and see how well it does.

Great! After just 3 epochs our model has achieved a validation accuracy of 98%.It also does not seem to overfit as the validation loss is decreasing along with the training loss.

But since we chose a very small validation set ,we might not be getting the same level of accuracy on the test dataset. Before we move on to testing our model on test data,let us plot and see how our model improved over the 3 epochs.

As we can see our model drastically improved right after the first epoch.This means that the model architecture we defined should be more than enough to tackle our classification problem.

Testing images from the test dataset

Testing individual samples

While we have been tracking the overall accuracy of a model so far, it’s also a good idea to look at model’s results on some sample images. Let’s test out our model with some images from the predefined test dataset. We begin by creating a test dataset using the ImageFolder class.

test_dataset = ImageFolder(data_dir+'/Test', transform=ToTensor())

Let’s define a helper function predict_image, which returns the predicted label for a single image tensor.

def predict_image(img, model):
    # Convert to a batch of 1
    xb = to_device(img.unsqueeze(0), device)
    # Get predictions from model
    yb = model(xb)
    # Pick index with highest probability
    _, preds  = torch.max(yb, dim=1)
    # Retrieve the class label
    return dataset.classes[preds[0].item()]

Now let us predict some images using our model.

Good ,we got 2 right and 1 wrong.The wrong prediction was predicting corn husk as corn which isn’t much of a concern and should be resolved by a bit more training data.

Testing on the entire test data

Identifying where our model performs poorly can help us improve the model, by collecting more training data, increasing/decreasing the complexity of the model, and changing the hyperparameters.

As a final step, let’s also look at the overall loss and accuracy of the model on the test set. We expect these values to be similar to those for the validation set. If not, we might need a better validation set that has similar data and distribution as the test set.

As we can see we got only 91% accuracy on the test data while we got 98% on the validation data.This might be due to the validation set not having a sufficient distribution of images.We should be able to improve the test accuracy by adding more data to training set.

Saving and loading the model

Since we’ve trained our model for a long time and achieved a resonable accuracy, it would be a good idea to save the weights of the model to disk, so that we can reuse the model later and avoid retraining from scratch. Here’s how you can save the model.

torch.save(model.state_dict(), 'fruits360-cnn.pth')

The .state_dict method returns an OrderedDict containing all the weights and bias matrices mapped to the right attributes of the model. We then save the weights and biases to a file called ‘fruits360-cnn.pth’.

To load the model weights, we can redefine the model with the same structure, and use the .load_state_dict method.

model2 = to_device(Fruits360CnnModel(), device)

Just as a sanity check, let’s verify that this model has the same loss and accuracy on the test set as before.

So we’re all set and good.The re-initialized model gives the same accuracy as our original model.

Conclusion

We have successfully created and trained a deep learning model based on CNNs to classify images of fruits and vegetables.We have also seen the Adam optimization algorithm which is different from the classical stochastic gradient descent but gave us great results.The accuracy can be improved a bit more if we use the test data as validation set and train the model.

The model can be further challenged and improved by introducing images of fruits and vegetables that are harder to differentiate into the training and test datasets.You can find the full code for this project here if you want to check it out.You can also use the same model architecture to tackle other similar classification problems.