In SpaceNet Challenge, the metric for ranking entries is based on the Jaccard Index, also called the Intersection-over-Union (IoU). This post describes in detail the metric as well as some of its benefits and limitations. For a great introduction to IoU, Adrian Rosebrock has a excellent introductory post on IoU.
Object Detection and Image Segmentation
Object detection is an application of computer vision algorithms to locate and identify objects of interest within digital imagery/video. Typically the region that is required to locate a target object is a bounding box: the vertices (in pixel coordinates) of a rectangular region containing the target object. Object detection also requires accurate classification of the bounding box into the associated category.
Image segmentation is an application of computer vision algorithms that classifies each pixel/voxel in a digital image/video into a category. Image segmentation results in more accurate distinction of different categories within an image, but less distinction of different objects of the same category.
The labels in the SpaceNet challenge are not bounding boxes, but rather polygons in a vector layer, or geojson. The flexibility of the polygonal label allows for more accurate localization than a bounding box but retains the ability to distinguish different objects of the same category.
Distance between two Regions
To train and evaluate computer vision algorithms, one needs a measure that can relate the distance between two regions. There are several candidates for this measure that have viability for certain applications:
- Euclidean distance between centroids. For large, isolated regions Euclidean distance may be appropriate, but lacks the capability to value any detail of the regions.
- Hausdorff distance. The Hausdorff distance may overvalue the boundary of a region compared to the interior. Additionally, the Hausdorff distance is computationally expensive to compute.
- Intersection-over-Union. The IoU presents a normalized (scale-invariant) measure that focuses on the areas of the regions. IoU shines at distinguishing regions that overlap but lacks detail on non-overlapping regions.
The LSVRC competitions associated to ImageNet use IoU as a metric largely because the scale-invariance works well with diverse object sizes. The SpaceNet competition leverages the familiarity of IoU to attract participation from the machine learning community. As SpaceNet evolves, IoU can scale with expected diversity of object sizes.
Some IoU Details
The IoU is a measure of how close two regions are two each other on a scale between 0 and 1 — a value of 0 means the regions do not overlap and a scale of 1 means that the regions are exactly the same. Explicitly,
IoU(A,B) = area(A intersection B) / area( A union B)
IoU can be converted into a true, mathematical metric but it is often preferable to use IoU directly as opposed to the related metric. Since IoU is scale invariant, IoU can be computed in world coordinates or in pixel coordinates with the caveat that the conversion may need to accommodate fractional values for pixels.
Scale invariance is generally desirable but may result in low IoU scores when automating detection of small objects for various reasons including, inaccurate labeled training data, pixelization in imagery, and sensitivity to occlusions.
Implications of GIS Imagery
Satellite imagery comes in a variety of resolutions depending on the satellite and sensor. Using the geospatial coordinate system to label the objects of interest allows for a resolution independent description of the location of an object of interest. We often call the GIS (short for Geographic Infomation System) coordinate system “world coordinates” when comparing to image-specific “pixel coordinates”. Conversion between the two coordinate systems is not difficult but could be a barrier-of-entry to working with GIS imagery.
One significant advantage that GIS imagery has over other imagery is the known scale. When working in world coordinates, object detection algorithms can be optimized to use the known scale and reduce the search space.
Thesholds and Detections
With an object detection algorithm, the performance of an algorithm should depend on how many objects the algorithm detects (true positives), how many objects it fails to detects (true negatives), and how many non-objects it detects (false positives). Just using IoU is insufficient to define a detection. SpaceNet defines a threshold of the IoU score of 0.5, above which is considered a detection and below which is not a detection.
Even though IoU is scale-invariant, resolution limits in an image make the 0.5 threshold challenging for small objects. The first instance of SpaceNet works with building footprint labels, for which the chosen threshold gives ample room to differentiate algorithms.
One additional feature that SpaceNet adopts from LSVRC is the notion that each labeled region can have at most one true positive associated with that labeled region. This feature is implemented by a sequential search for a true positive sorted by decreasing IoU values. If a true positive is found, then the pair — the label and the proposed region — are removed from the sequence and the search continues. The following flowchart removes ambiguity based on the order the proposals are submitted.
Precision, Recall, and F1
After having defined detection based on Chart 1, we summarize the performance using precision, recall and F1 score.
Precision is the fraction of proposals that are true positives. Precision does not measure the number of objects that the algorithm has failed to detect. If the number of objects in an image is known, then precision is more valuable.
Recall is the fraction of labeled objects that are detected (true positives) by the algorithm. A low precision algorithm can still have high recall, usually implying many false positives. And an algorithm with low recall can still have high precision, usually implying few but accurate guesses.
SpaceNet chooses the harmonic average of precision and recall, essentially giving equal weight to each; this average is called the F1 score and is defined explicitly as follows:
F1= 2 * precision * recall / (precision + recall).
The count of true positives and false positives is aggregated over all of the (test) images in SpaceNet. This is in contrast to other algorithms where precision, recall, or F1 is evaluated per image. There are many alternatives for averaging to obtain an F1 score, but this choice resolves some confusing conditions where images have no objects to detect.
Examples of the IoU computation on SpaceNet imagery are available on another DownlinQ blog. IoU values are plotting with region proposals on SpaceNet imagery using QGIS.