Quantifying the Effects of Resolution on Image Classification Accuracy
In the previous post, we briefly discussed the HOG feature descriptor and results from boat heading classification. Recall that classifiers based upon HOG features are less powerful than deep neural networks, though potentially much faster, and they have the added benefit of not requiring GPUs. In this post we investigate the effect that image resolution has on classification accuracy. We do this by artificially degrading high resolution DigitalGlobe (DG) satellite imagery, as well as utilizing lower resolution 3-meter ground sample distance (GSD) Planet imagery.
1. Resolution Study Data
The impact of image resolution on classification accuracy can be explored by reducing the resolution of the original training images (see more details on the datasets post). We re-project image cutouts at various resolutions from the original DigitalGlobe 0.34 and 0.5 meter GSD to [0.3, 0.5, 1, 2, 3, 4, 5, 10] meter GSD. This re-projecting helps improve algorithm robustness for small objects and allows investigation of the effects of image resolution. For validation we re-project the DigitalGlobe validation data (see Section 4 of the datasets post) to [0.5, 1, 2, 3, 4, 5, 10] meter GSD, and also include Planet (PL) data presumed to be at native 3m GSD. Validation cutouts have a continuous range of headings [0.0, 360.0) and are input into the heading classifier with the goal of returning the nearest 5-degree bin (e.g.: a validation image with a heading of 46.9 degrees should be classified as having a heading of 45 degrees).
2. Resolution Study Performance
Boat heading accuracy is strongly dependent on the object size-to-pixel ratio (object length in pixels). To explore this issue we bin validation images by length.
The highly skewed length count histograms of our validation sample leads to misleading results if one plots boat heading accuracy as a function of resolution (GSD) summed over all boat lengths; this global estimate essentially just mimics the results for the largest bins in Figure 3: 10–20m for DigitalGlobe and 40+ meters for Planet. We therefore instead display results broken out into the four boat length bins. Another item of note is that the classifier is trained primarily on 10–20m boats (see Figure 3 of the datasets post), so one might expect greater accuracy in this bin.
For each boat length bin, the multi-resolution image corpus is input into the HOG+PCA+LogReg classifier corpus trained in the previous post, and for each resolution we compute the accuracy, false negative rate, and accuracy ± 180 degrees to account for the visual symmetry of many boats (see Figure 4). As in the previous post, we use the 30 different classifiers as well as 1000 bootstrap resamples to quantify errors (± one standard deviation).
Since determining whether a boat is present could be more important than the exact heading of the boat, depending on the specific application, we also test the false positive rate at various resolutions by inputting a large number of null images into the classifier. See the previous post, Figures 4 and 5 for examples of false positive and false negative classifications. The false positive and false negative rates for the resolution study are shown below in Figure 7.
3. Accuracy as a Function of Training Dataset Size
Collecting an adequate training dataset is often the primary obstacle of many computer vision classification tasks. In this section we investigate how classifier accuracy depends on the training dataset size, using boats of all lengths. Of note is how quickly the classifier converges, with a mere 100 samples very close to peak accuracy.
4. Conclusions
In this post we quantified boat heading accuracies and confidence levels as a function of image resolution, finding a steep degradation for small objects and a much more gradual decline for objects over 40m in length. We were also able to quantify the classifier performance as a function of training set size, finding that the HOG + PCA + logistic regression classifier performs remarkably well with as little as 100 distinct training images. Boat heading accuracies inferred from Planet data are generally similar to the accuracies inferred from degraded high-resolution DigitalGlobe images at comparable ground sample distance. Both datasets also boast very low false positive rates, although the false negative rates inferred from Planet data are generally worse than from degraded DigitalGlobe data.
Performance curves like the ones we show here should help guide end users to determine the imaging GSD necessary to classify objects of interest at a desired accuracy level.
In this day and age it may seem naive to embrace any computer vision technique not based upon deep learning, but as we have shown here, some classical techniques still provide compelling results in certain problem areas.
* Footnote: Section 1 of this post was updated on 11 October 2016 to further clarify the specifics of validation data.