How we utilized gamma correction for increasing our training data ?

Giscle
Giscle
Published in
5 min readDec 29, 2017

We are guessing you have remembered our first object detection training on the India road using local data and the result was not very bad because we had used only 1,318 images for the training. As we have mentioned, we will train the model with larger data set. Since we all love data annotation :P, we added only 784 new images and now we had close to 2,100 images. However, we thought training the model with data set from 1,300 to 2,100 will not bring any significant improvement. We had an option of using Data Augmentation but we were afraid of using data augmentation because we wanted to keep our self away from any kind of annotation.

Now, you also remember our lane detection read where we had used gamma function. While one of our road lane detection team was presenting how and why they used the gamma correction in road lane detection, team object detection thought of using this idea of Gamma correction for increasing the number of training data-set.

If we talk in general, gamma function manipulate the brightness and intensity of an image. You can see the effect of gamma correction below, where g=1 for the original image.

Intensity shift of image for gamma value 0.1
Brightness shift of image for gamma value 0.5
Brightness shift of image for gamma value 2.0
Brightness shift of image for gamma value 3.0

We chose two different sets of gamma value 0.5 (darker) and 3.0 (brighter) and 0.5 and 2.0 just for analyzing the over all output improvement :).

From 2,100 images, now we had 6,300 images and the most important part there was no need for any additional annotation (thank god!!). As compared to 1,300 from the first training, 6,300 was almost five times grater but we were not going to take any chances. We also mirrored all 6300 images.

Mirrored Images

Now we were able to perform some simple mathematical operations to modify the existing XML and avoid annotating the new images. The bounding boxes are created using the xmin, ymin, xmax, and ymax values of the rectangle. The ymin and ymax values remain unchanged when mirrored, but the xmin and xmax values needed to be recalculated. The xmin for the mirrored image was calculated by subtracting the xmax value of the original image and the new xmax value was the result of subtracting the original xmin value from the width.

New bounding box on mirrored image after applied calculations

Now we have 12,600 images each containing multiple annotations. For the image above, three objects (a motorbike, a car, and a truck) were annotated. By applying gamma correction and mirroring the images, those 3 annotations in our original csv became the 18 unique images with proper annotations seen below.

Annotations generated from one original image

The code for this can be found on our GitHub. Now we were ready to train the model (A detail on how we trained our model). Again we started training the model and we trained the model for 6 hours. It seems we need to tune the some part in our model because we were unable to get the loss below 4 in both of our training. Some reasons for this will be discussed after looking at the results.

Total Loss after 6 hours of training.

Like always we were very excited to see the result and the result were many folds better than the last time we had trained the model. Now the model is detecting objects much more often and with more accuracy. The model seems to now be identifying most objects in the distance as cars. We think this is the case because it is correct most often with that prediction for distant objects. This, however, is one likely cause for why we are seeing our loss plateau at around 4. Our next steps will look at solutions for improving accuracy on distant objects in hopes of seeing an improvement in our loss.

Model_Object_Detection

Trouble: We have not mentioned one issue which we faced while training the model. Since some images did not have any objects, they therefore were included without an accompanying annotation file. We wanted to keep the images though, thinking that in future we may need to annotate items from that image. This caused an issue with our pre-existing scripts but by inserting a try/except statement we were able to bypass the images without annotation.

The trained model is doing better and now we have started to detect most objects with good accuracy. There is still tons of improvement. Addition of new object categories, addition of traffic lights, police booth, speed and direction signs etc. And yes the target is to achieve 99.99% accuracy.

If you have any suggestion please let us know in the comment. And if you would like to join our team please share your profile at career@giscle.com

This work has been done by Devin Shanahan, Sonu Chauhan, Mukesh Jha (Team Name Auto Magic) and Roshan Adhikari, Umar Farooq, Suresh Amrani (Team Neuro Knights).

--

--

Giscle
Giscle
Editor for

Computer Vision platform offering three core vision services (Detection, Recognition and Analysis) in the form of easy to integrate APIs and SDKs.