Experience with iWildCam 2019 Kaggle Competition

Published in

Konvergen.AI

9 min readAug 16, 2019

To get started in learning data science and machine learning in real applications, the Kaggle Competition is a good step for beginners to start an interesting experience. This step is what we did when we started working at Konvergen.AI as an Artificial Intelligence Engineer. Kaggle is a platform in which companies and researchers post data and statistics and data miners compete to produce the best models for predicting and describing the data.

Introduction

The competition that we follow is the iWildCam 2019 Competition, a competition aimed at classifying animals from Camera Traps. Camera Traps (or Wild Cams) enable the automatic collection of large quantities of image data. Biologists all over the world use camera traps to monitor biodiversity and population density of animal species. The training set contains 196,157 images from 138 different locations in Southern California. The test set contains 153,730 images from 100 locations in Idaho. Aside from the image data for which there are no animals, the class of animals available in the training data is 13, while in the testing data there are 22 animals. Therefore the challenge in this competition is to add data from other sources so that the number of animal classes is complete. In this post, we will explain the steps we used to get to the 6th ranked (top 2%) of the leaderboards.

Dataset

The datasets came from three different sources: the California Camera Traps (CCT) for the main training dataset, the iNaturalist 2017 and 2018 competitions, combined to become iNat_Idaho to supplement the training dataset, and Idaho Fish and Game (IDFG) for the testing dataset. The iNat_Idaho dataset contains 25667 images. The raw images of the CCT and IDFG datasets were more than 2000 pixels wide and more than 1000 pixels high while the dimensions of the iNat_Idaho images were in the range of hundreds of pixels. The iWildCam 2019 competition committee has provided the teams with smaller images for the CCT and IDFG datasets, resized to 1024 pixels wide as well as the original images.

The iWildcam_2019 committee also provided bounding box data for most of the training images (CCT and iNat_Idaho) and all of the test images. They provided the object detection source code too. [1]

Preprocessing

Before used for training, the main training dataset was explored for the class labels. It turns out the dataset from CCT has only 14 classes from 23 classes, including empty class. To provide for the missing classes, we supplement the CCT dataset with iNat_Idaho dataset. After combining the data, the distribution of the classes were plotted again and we found that the largest proportion of the dataset was of the empty class.

This class imbalance was handled using the stratify method from the train-test split function from sklearn’s model selection library.

Next, we crop the images that have the bounding box data. We deploy the code written by AbuduWeili [2] to apply the bounding box to the images and crop image, leaving only the animal in the image. Some instructive problems were encountered in this stage. The first problem was how to understand the code and apply it to our use case. Particularly the names of the directories in their use case and ours were different and so we need to map their directory names to our own. Another problem was the difference in naming the data set. AbuduWeili’s group used the names CCT and iNat to categorize the main and supplemental training datasets while we used is_supp = False for the training dataset and is_supp = True for the supplemental iNat_Idaho dataset.

Once we figured out the mapping of these names to our use case, we encountered a second problem where although we ran the modified code successfully without any error messages the resulting images show that the bounding box missed the animals entirely and many of the cropped images have undefined file size. It turns out that the bounding box data were obtained with the image detection algorithm ran over the original size images. We solved this problem by obtaining the original sizes from the JSON annotations for the training and supplemental images. For the test images, we get the original size data from the test_file.csv from the WallEclipse GitHub. The column names of width and height from that file were swapped, so we should get the image width from the height column and vice versa. After adding the original height and width data to our data frames, the bounding boxes now surround the animals in the training and test datasets but they were still not applied to the supplemental training dataset. The cause was that although the image ids for the supplemental dataset were purely numbers, the ids associated with the bounding box data were of string datatype instead of integers. So we converted the type in the id column for the supplemental dataset from integers to strings. After this step, all images that have the bounding box data associated with their ids were correctly bounded and cropped, although some boxes went over the image boundary if the animals were not in the center of the image.

After we studied the code for image cropping some more, we found that AbuduWeili’s team applied padding around the animals in the bounding box so when cropped, a small area around the animals will be included. When this padding was removed the resulting bounding box now bound the animals tightly and the total size of the cropped images is now far smaller than the downloaded images, 6 GB compared to 50 GB when downloaded from Kaggle Site and iWildcam GitHub. This final set was used for training and testing. We found out that the testing accuracy and f1 score increased when the original images were cropped so that only the animals of interest were present. Through two-stage cropping, we learned that a tight bounding box is necessary for a good accuracy and f1-score. If padding area is included in the cropping and the bounding box, the accuracy is affected in a negative, non-negligible way.

Our conclusions from this part of preprocessing are:

In image classification with missing data from several classes, we must first supplement the data to include the missing classes before doing any training.
To test a processing code, use at least one data from each distribution because there might be differences across the distributions and the fact that the code works for one distribution does not mean it will work for all distributions.
In projects involving bounding boxes and cropping, it is best not to pad the bounding box because it can affect the accuracy negatively.
If given a set of bounding box coordinates, find out what do the coordinates refer to and what is the original size of the image the bounding box were originally applied.
To help in understanding someone’s code, find out the output produced by the code as well as trying to map the similarities between your and their use cases.

After we had gotten the cropped dataset, the next preprocessing step is Image Enhancement. This original dataset from the camera trap was a challenge for us. These images had had some problems like poor lighting, blur area, too small animal, plant-covered animal, cropped animal, nuisance from weather conditions, same places with different conditions, et cetera. We assumed that these problems had been a reason why my team had gotten a lower score than the baseline (in this competition the baseline is 0.115). Because of that, we started to try some enhancements with OpenCV.

First, we tried CLAHE (Contrast Limit Adaptive Histogram Equalization). CLAHE tends to reduce noise in relatively homogeneous regions of an image [3]. Then, we used the White Balance (WB). White balance (WB) is the process of reproducing neutral colors, the specialized techniques we used are Simple White Balance and Gray-World White Balance [4]. A Simple White Balance algorithm works by independently stretching each of the input image channels to the specified range. For increased robustness, it ignores the top and bottom p% of pixel values [5]. While Gray-World White Balance algorithm scales the values of pixels based on a Gray-World assumption which states that the average of all channels should result in a gray image. It adds a modification which threshold pixels based on their saturation value and only uses pixels below the provided threshold in finding average pixel values [6].

Training

Based on our best private score and public score (0.267, 0.224), we resized the newest cropped images to 128 x 128 (tightly cropped images without padding), then augmented the training dataset with random horizontal flip and random erasing. We followed Walleclipse’s documentation code used mixup regularization. Theoretically, mixup can improve the generalization of state-of-the-art neural network architectures, reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks. Mixup extends the training distribution by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of the associated targets. In a nutshell, mixup constructs virtual training examples. We trained the dataset use DenseNet-121 with extra fully connected layer, Adam Optimizer (lr=0.001), Cross entropy loss, and learning rate scheduler ReduceLROnPlateau (factor=0.5, patience=2). We trained until 10 epochs, and use the weight from the sixth epoch because it has had the highest f1 validation. For additional information, when we used tqdm during the training, my pc being crashed many times. So don’t use tqdm during training when your pc still use HDD and small size memory.

Testing

We input the tight cropped images to be inferenced with same optimizer, scheduler, images size as in training session. We get the 6th ranked (top 2%) at private score based on the leaderboards, but our score did not show on it because we did a late submission. Here, the result of our submission,

Conclusion

One of the challenges in data science is the data preparation stage. Before entering the modeling and training stage, the data used must be clean, complete and orderly. If the data used in the preparation stage is wrong, then the next process will also follow. So that the preparation stage is crucial and very important stage. In this competition, the data preparation stage includes finding additional data that is suitable to complete the animal class and also performing the animal detection and cropping stages in the image. It also carried out several image enhancement processes to improve image quality. So that from these stages can be obtained better classification results to be able to categorize the animals in Camera Traps.

Acknowledgment

We would like to express our special thanks of gratitude to Konvergen.AI who gave us the golden opportunity to do this great Deep Learning project from Kaggle Competition. We hope it will be a good start for us entering the world of Machine Learning. Here we introduce the team,

Mega Fransiska, 2nd year Student of Mathematics Department, University of Indonesia. Interested in Mathematical Computation, Data science, and focusing on Deep Learning. Now, start to develop my interests as an intern in Konvergen.AI as AI Scientist.

Daniel K. Suhendro, Graduated from Flinders University in 2018 with a Ph.D. degree in Chemistry. Started learning about Data Science and Deep Learning in 2018. Now working as an intern in Konvergen AI as AI Scientist.

Saripudin, Recent graduate from Bandung Institute of Technology with specialization in Control Engineering and Intelligent System. Interested in research and technology area, including Data Analysis and Modelling, Machine Learning, Software Engineering, Robotics and Embedded System.

Reference

iWildcam 2019 Kaggle Competition: https://www.kaggle.com/c/iwildcam-2019-fgvc6/overview , https://github.com/visipedia/iwildcam_comp, accessed on July 8th 2019
AbuduWeili’s report and code repo: https://github.com/Walleclipse/iWildCam_2019_FGVC6 accessed on July 10th 2019
Zhao, Yao, Xiangwei Kong, David Tauman. 2017. “Image and Graphics”, in https://books.google.co.id/books?id=R9FEDwAAQBAJ accessed on August 6th 2019.
Abraham, Chandler. 2017. “What is the goal of white balance?”, in https://medium.com/hipster-color-science/what-is-the-goal-of-white-balance-b0b1f2b18951 accessed on August 10th 2019.
Doxygen. 2019. “cv::xphoto::SimpleWB Class Reference”, in https://docs.opencv.org/master/d1/d8b/classcv_1_1xphoto_1_1SimpleWB.html#details accessed on August 10th 2019.
Doxygen. 2019. “cv::xphoto::GrayworldWB Class Reference”, in https://docs.opencv.org/3.4.4/d7/d71/classcv_1_1xphoto_1_1GrayworldWB.html#details accessed on August 10th 2019.

Experience with iWildCam 2019 Kaggle Competition

Written by Saripudin