Quantifying the Effects of Resolution on Image Classification Accuracy

Adam Van Etten
The DownLinQ

--

In the previous post, we briefly discussed the HOG feature descriptor and results from boat heading classification. Recall that classifiers based upon HOG features are less powerful than deep neural networks, though potentially much faster, and they have the added benefit of not requiring GPUs. In this post we investigate the effect that image resolution has on classification accuracy. We do this by artificially degrading high resolution DigitalGlobe (DG) satellite imagery, as well as utilizing lower resolution 3-meter ground sample distance (GSD) Planet imagery.

1. Resolution Study Data

The impact of image resolution on classification accuracy can be explored by reducing the resolution of the original training images (see more details on the datasets post). We re-project image cutouts at various resolutions from the original DigitalGlobe 0.34 and 0.5 meter GSD to [0.3, 0.5, 1, 2, 3, 4, 5, 10] meter GSD. This re-projecting helps improve algorithm robustness for small objects and allows investigation of the effects of image resolution. For validation we re-project the DigitalGlobe validation data (see Section 4 of the datasets post) to [0.5, 1, 2, 3, 4, 5, 10] meter GSD, and also include Planet (PL) data presumed to be at native 3m GSD. Validation cutouts have a continuous range of headings [0.0, 360.0) and are input into the heading classifier with the goal of returning the nearest 5-degree bin (e.g.: a validation image with a heading of 46.9 degrees should be classified as having a heading of 45 degrees).

Figure 1. DigitalGlobe Cutouts at 0.5, 1, 2, 4, 10m GSD. Top: 32.7m boat, Middle: 87.2m boat, Bottom: 22.0m boat. As one would expect, the 87m ship is far more discernible at 10m GSD than the other rows due to the larger object size (Imagery Courtesy of DigitalGlobe).
Figure 2. Planet cutouts (3m GSD) of boats of length [304, 205, 83, 50] meters (left to right) (© 2016 Planet Labs Inc. All Rights Reserved).

2. Resolution Study Performance

Boat heading accuracy is strongly dependent on the object size-to-pixel ratio (object length in pixels). To explore this issue we bin validation images by length.

Figure 3. Counts of boats in each size bin for DigitalGlobe (left) and Planet (right). Bins are: 0–10m, 10–20m, 20–40m, 40+ meters. The maximum DigitalGlobe boat length is 97 meters, while the maximum Planet boat length is 349 meters. Minimums are 3m for DigitalGlobe and 14m for Planet.

The highly skewed length count histograms of our validation sample leads to misleading results if one plots boat heading accuracy as a function of resolution (GSD) summed over all boat lengths; this global estimate essentially just mimics the results for the largest bins in Figure 3: 10–20m for DigitalGlobe and 40+ meters for Planet. We therefore instead display results broken out into the four boat length bins. Another item of note is that the classifier is trained primarily on 10–20m boats (see Figure 3 of the datasets post), so one might expect greater accuracy in this bin.

For each boat length bin, the multi-resolution image corpus is input into the HOG+PCA+LogReg classifier corpus trained in the previous post, and for each resolution we compute the accuracy, false negative rate, and accuracy ± 180 degrees to account for the visual symmetry of many boats (see Figure 4). As in the previous post, we use the 30 different classifiers as well as 1000 bootstrap resamples to quantify errors (± one standard deviation).

Figure 4. Examples of correct classification (second and fourth columns) and 180 degree mis-classification (first and third columns) (Imagery Courtesy of DigitalGlobe).
Figure 5. Resolution study accuracy results. Inverted triangles denote results using Planet data, while dots and lines show results using DigitalGlobe data. Error bars are computed via bootstrap resampling. The larger error bands for longer boats are due to the lower number of boat counts at those bins. The blue line denotes classifications within 5 degrees of ground truth, and the green line signifies the fraction within 5 degrees of ground truth or rotated 180 ± 5 degrees. Accuracy appears to asymptote to very low values at a length-to-resolution ratio of ~3, implying that the heading of an object of size 3 pixels or less cannot be distinguished. In general, accuracy improves as boat size increases; for the largest boats, results are erratic given the lower statistics and also in part because of the greater level of 180-degree rotation symmetry. Note in particular the large gap between the blue and green bands for the largest boats, due to the high degree of 180-degree rotation invariance for the largest boats. This effect is particularly obvious with Planet images, for which correct labeling of boat heading is difficult (see Figure 2). Overall, the results from Planet data (inverted triangles) match well with re-projected DigitalGlobe data; however, the error bars on the Planet data for smaller boats are large due to the small sample size.
Figure 6. Classification predictions for the 40+ meter length bin, at 5m resolution. As Figure 5 indicates, accuracy levels are still quite high even at 5m GSD (Imagery Courtesy of DigitalGlobe).

Since determining whether a boat is present could be more important than the exact heading of the boat, depending on the specific application, we also test the false positive rate at various resolutions by inputting a large number of null images into the classifier. See the previous post, Figures 4 and 5 for examples of false positive and false negative classifications. The false positive and false negative rates for the resolution study are shown below in Figure 7.

Figure 7. Resolution study false positive (red) and false negative (orange) results. Triangles denote results using Planet data, while dots and lines show results using DigitalGlobe data. Error bars are computed via bootstrap resampling. False positive rates are independent of boat size, and increase only slightly as resolution worsens, remaining below 5% for DigitalGlobe data. The Planet false positive rates are somewhat lower, largely because the majority of the Planet background images are of open water and easily distinguished from boats. False negative rates are somewhat erratic, increasing markedly as imagery is degraded; this is unsurprising, as a very blurred boat will look similar to background. For object-to-pixel ratio of <= 3 the false negative rate is unreliable due to classifier confusion for very small images (See Figure 8), so we truncate the false negative plots in the 0–10m and 10–20m length bins for this reason. The false negative spike at high resolution for the largest boats likely reflects the low number of 40+ meter boats used in training and confusion over large boats and linear structures (such as buildings) in background images (See Figure 9). Overall, Planet data produce higher false negative rates compared to results from the 3m blurred DigitalGlobe data.
Figure 8. Results for the 0–10m length bin, at 4m resolution. At this resolution, boats are a mere 2–3 pixels in size (plots above are smoothed), and so predictions are highly inaccurate. In short, any results for a length-to-resolution ratio of less than three are not to be believed (Imagery Courtesy of DigitalGlobe).
Figure 9. Results for the 40+ meter length bin, at 0.5m resolution. Heading predictions are in general quite good, though an unexpectedly high false negative rate arises due in part to the low number of large boats used in training. At this resolution, large boats sometimes appear like the urban scenes used in background images. This issue could likely be remedied via a larger training corpus of 40+ meter length boats (Imagery Courtesy of DigitalGlobe).

3. Accuracy as a Function of Training Dataset Size

Collecting an adequate training dataset is often the primary obstacle of many computer vision classification tasks. In this section we investigate how classifier accuracy depends on the training dataset size, using boats of all lengths. Of note is how quickly the classifier converges, with a mere 100 samples very close to peak accuracy.

Figure 10. HOG Classifier accuracy on DigitalGlobe imagery as a function of input image training size, over all boat lengths. Each line represents the accuracy at a particular resolution. Error bars are computed as in Section 2. A training size of 800 incorporates the entire corpus of rotated boats. Each line corresponds to a different resolution. For each resolution, the classifier converges rapidly to nearly peak accuracy.

4. Conclusions

In this post we quantified boat heading accuracies and confidence levels as a function of image resolution, finding a steep degradation for small objects and a much more gradual decline for objects over 40m in length. We were also able to quantify the classifier performance as a function of training set size, finding that the HOG + PCA + logistic regression classifier performs remarkably well with as little as 100 distinct training images. Boat heading accuracies inferred from Planet data are generally similar to the accuracies inferred from degraded high-resolution DigitalGlobe images at comparable ground sample distance. Both datasets also boast very low false positive rates, although the false negative rates inferred from Planet data are generally worse than from degraded DigitalGlobe data.

Performance curves like the ones we show here should help guide end users to determine the imaging GSD necessary to classify objects of interest at a desired accuracy level.

In this day and age it may seem naive to embrace any computer vision technique not based upon deep learning, but as we have shown here, some classical techniques still provide compelling results in certain problem areas.

* Footnote: Section 1 of this post was updated on 11 October 2016 to further clarify the specifics of validation data.

--

--