How we trained our first object detection model with India data ?

Giscle
Giscle
Published in
4 min readDec 21, 2017

If you remember, we had tested tensorflow object detection api with our data few months back. We had tried with both day and night time data, but as we had used a pre-trained model and the results were not surprising. The api was unable to detect local vehicles like motorbike, auto, truck etc. But the api gave us a kind of sense of what we need to do for an object detection system which can work with accuracy 99.99% on India road.

Since we were comfortable with the TensorFlow object detection api, we planned to train our model with TensorFlow only.

Data:

Thanks to our OfferCam user we have a good amount of data in the form of 30fps video. We collected every 10th frame, 3 frames from each second, to minimize the chance of data repetition.

We have a video section of 7:23 minutes and which gave us 1318 images. We started with small number so it would take less time and gave us initial validation.

Data Annotation:

We annotated the data using LabelMe tools with seven major Object categories. 1. Car, 2. Bus, 3. Truck, 4. Person, 5. Auto, 6. Motorbike, 7. Animal.

Data annotation using LabelMe

After annotation, we had just over 8000 labels with seven object categories. With multiple team members annotating data separately, we made some mistakes such as spelling errors, misidentified objects, inconsistent labels, etc. Using pandas, we were able to identify these mistakes and rectify them later.

LabelMe annotation labels with mistakes

Data Storage

Since our team members are from all across globe we are unable to use a local GPU for training so we selected Google Cloud for GPU training. We created a storage bucket, uploaded all the data with annotation so everyone could access it from anywhere.

Instance Preparation:

Then we began preparing our Google Cloud Instance. We documented this process in our step by step for instance preparation. We copied the data from Google bucket and installed Tensorflow object detection api. Then we split our images into a training set and a test set. Next, we converted the xml files from LabelMe into a training csv and a test csv and then converted the csv files into TFRecords. Now we were ready to train our model.

Model training:

After instance preparation we started training the model. We chose to use transfer learning because we had only 1318 images. We chose ssd_mobilenet as our pre-trained model. But after the first training we realized that we made an error in when converting the xml to csv and had only included one annotation per image to the csv. We fixed that issue and had much more data to begin training again.

After 8 hours of GPU training on Google Cloud, we were seeing only small changes in the loss, so we stopped our training. As always, we were excited to check the result. Surprisingly, the results were better than expected, the model was now able to detect the motorbike, auto, etc. However, there were still huge opportunities for improvement because often the model was mistakenly misidentifying trucks and buses as cars, and sometimes not identifying objects at all.

Loss after 8 hours on the GPU

We were not disappointed because with 1300 images the accuracy was acceptable. The more data we had for an object, the better its accuracy at identifying the object correctly. Check out our video below.

Since that video is the same as the one used to train our data, we also applied it to a second video (unseen data) to ensure our result did not overfit. This video below shows the model does generalize to new data. On one side if model was able to detect the person, motorbike, truck, auto etc. at the same time it was not able to detect same object and sometimes it was detecting horizontal poles as car :). But of course with such a small dataset we were happy with our result.

Going forward, we will be focusing on boosting our data set and fine tuning our model to improve our Object detection model accuracy to 99.99%.

You can request for dataset Click Here

If you have any suggestions for us please let us know in the comment. And if you want to join our team share your profile at career@giscle.com

This work has been done by Devin Shanahan, Sonu Chauhan and Mukesh Jha and their team name is AutoMagic.

--

--

Giscle
Giscle
Editor for

Computer Vision platform offering three core vision services (Detection, Recognition and Analysis) in the form of easy to integrate APIs and SDKs.