Fruit and Vegetable Detection and Feature Extraction using Instance Segmentation-Part 2

Prakruti Chandak
Codalyze
Published in
4 min readJun 5, 2019

About the Series:

The goal of the project is to build a system that is able to identify fruits and vegetables. Along with the identification, it should also be able to get the features of a particular category/class. We are assuming that we have a conveyor belt with fruits and vegetables on it. The input can be an image or video but in this case, we’ll be using images. The series will have the following parts:

PART I — Choosing the model and the dataset
PART II — Retraining the model according to the dataset

Task description:

To expand our dataset by adding more classes relevant to our project.

The MS COCO dataset has 81 classes, out of which only five are useful to us; these include carrot, broccoli, orange, apple and banana. We retrained the model by adding Pineapple to our database.

Note:
In order to implement the task, you will have to reproduce the Mask R CNN model. You should be able to detect objects from ms coco dataset using weight file mask_rcnn_coco.h5.

There are several posts that you can follow, I have given a link to all of them in the reference section.

Steps for retraining:

  1. Creating dataset:
    I tried searching for an already available dataset that has the mask and has the bounding box annotation. It was difficult to find, and even when I could get one, it wasn’t of good quality. So I decided to create a new dataset.
    For this purpose, I used Flickr and google images to search and downloaded using the tool (Chrome extension)
  2. Annotation:
    The images are to be annotated (or labelled), for the purpose we can use VGG image Annotator (VIA). For every image, we should create masks according to the number of objects in it. Once the masks are created you can move to the next image, the masks are stored by itself over the browser, after the task is done you can export the data in json format. The json will contain the polygon points and other information that is required by the training script. You can also import the json file, in case you wish to increase the dataset size.
    I used 175 training images 50 testing images, for 850 steps per epoch.
    Note: Try to use a single type of polygon, using polyline and circle tool created a problem for me.
VGG VIA annotation capture

3. Creating a Config file and loading dataset
Config file contains all the set variables for the training, I tried not to change the config, so I inherited the same config file. The file looks like:

Import mrcnn.config as ConfigNAME = “fruits”IMAGES_PER_GPU = 2NUM_CLASSES = 1+1 
# background + pineapple
# it can be 1 + x (no. of classes you wish to add)
STEPS_PER_EPOCH = 1000 #the number can be according to required accuracyDETECTION_MIN_CONFIDENCE = 0.8BACKBONE = “resnet101”VALIDATION_STEPS = 5

4. Training
Training is the most crucial part of your model. In this process, we’ll have to provide the system with the data and machine learning algorithm, in order to get the weights file. Once we get the weights file we can run our detection code using the weights file.

Training file looks like:

load_images()
load_mask()
load_references()
#load the above-mentioned data
dataset.prepare()
model = modellib.MaskRCNN(“ … ”)model.train(“ … ”)#save the model (.h5 file)

5. Results
Once we get the weights file we can run our detection script using the weights file.

load_weights()
detect(model, “path/to/img/or/video/”)
object detection after retraining

Referencing

--

--