Unique Learnings from the Omdena Challenge: Identifying Trees from Satellite Images to Prevent Fires
Through community-driven collaboration, we not only found the best-fit model for identifying trees but learned which approaches did not work in a fast manner.
Omdena, is an intuitive and fast-growing global collaborative platform is built with a purpose to solve real-world social problems through the collaboration of AI enthusiasts across the latitudes and altitudes of this wide world. Since the beginning of May, Omdena launched a set of social challenges you can apply for as an AI enthusiast.
In the challenge “Identifying Trees on Satellite Images”, we collaborated with the developers and data scientists from the Spacept startup, a Swedish tech company focusing on building machine learning models based on satellite images.
Their objective is to identify trees from satellite images to prevent power outages in towns and cities. They also focus on preventing catastrophic fire ignited by dead trees and lightning storms; nonetheless, this endeavor posed several conundrums to the Spacept team. This led them to collaborate with Omdena to launch this challenge as a part of Omdena’s global challenges.
How we found the best-fit model
In this challenge, we split the participants into five groups. Each group was given a specific task. For instance, our tasks were only on labeling the satellite images, a task to explore and implement a semantic algorithm for image classification known as U-NET. Our group was proudly responsible for building another image classification model known as Mask Region-CNN or Mask R-CNN.
What is the Mask Region-based CNN model?
In essence, the Mask R-CNN model is a derived method from the Fast Region CNN model. The latter one is a sophisticated technique with a combination of object detection by using bounding boxes and semantic segmentation for pixel classification. The former is viewed as a simple method that uses the same technique of object detection and bounding boxes, but with a feature of adding an extra branch for predicting object masks in the Region of Interest (RoI). Thus, we say that the fundamental part of this model is to construct mask branches to obtain the best results.
Configuration of the model
We labeled our images using LabelBox generating annotations as JSON and COCO (Common Objects in Context) formatted files. It gave us roughly 240 labeled images.
It is quite expensive to capture satellite images with large resolutions; hence we cropped our images and their corresponding masks down to 500x500 pixels.
The original annotation comprised of two classes: trees and bushes/others. We decided to only keep all tree labels and other kinds of stuff as background to simplify the training and evaluation.
We split our dataset to 60% training, 20% validation, and 20% testing.
After acquiring a set of 240 images labeled with its respective annotations, we made a conscious decision to establish the following set of goals to attain:
- Application of some types of image augmentation techniques for training, testing and validation of our model.
- Application of 4 image filters to determine if this could provide better results to the model.
The goal was set for a training of 50 epochs.
For the image augmentation techniques, we applied the following: flip left and right, flip up and down, application of Gaussian blur, rotation to 45 degrees, rotation to 65 degrees, rotation to 205 degrees.
We trained for 76 epochs but noticed a subtle inflection point in the validation loss, suggesting some overfitting at around 55 epochs. The results were somewhat decent when there are smaller instances of trees in the image.
However, the model didn’t perform so well for larger groups of trees as seen below.
Prediction for a larger group of trees
We decided to apply the image filters Clahe, Dehaze, Gamma and Kernel to our training, test and validation sets. We then verified if the results showed any improvement to the model.
After applying each set with a different filter, we executed our model. We found out that the prediction of trees in the validation tests was not accurate as seen below:
What we learned
The Mask R-CNN model implementation proved to be a very powerful technique for object detection. The reason did not only acquaint to the accuracy for identifying different objects in an image, regardless if the image is altered by its size, position or rotation. It’s a simple model to configure and implement. Throughout building this process, we discovered that this model performed well at identifying trees. Later on, we discovered that pure semantic models such as the U-Net model performed even better at this task; the reason behind this downside of the Mask R-CNN model is due to the fact that it tries to identify each object separately as an individual entity.
The case is different with semantic segmentation architectures where they identify the objects to mask as a whole entity. Nonetheless, better-labeled images and an increase in its resolution is capable of providing similar results as the semantic model.
Thanks Labelbox and Omdena
We would like to thank the Labelbox team for providing us the full suite of its tools during our tenure working on the challenge. We are also in debt with the Omdena team, for giving us the opportunity to collaborate and connecting us with other data scientists and software developers.
That was a truly rewarding learning experience.