Histogram of Oriented Gradients (HOG) Boat Heading Classification

Adam Van Etten
The DownLinQ
6 min readAug 12, 2016

--

In the previous post, we detailed the datasets used for the research results we aim to discuss here and in future posts. In this post we will cover one method for image classification, namely histograms of oriented gradients (HOG) combined with dimensionality reduction as an input to supervised machine learning algorithms. We also discuss methods for estimating confidence levels, which is a crucial step in determining the utility and applicability of our algorithms.

1. Histogram of Oriented Gradients (HOG)

HOG feature descriptors and their extensions remain one of the few options for object detection and localization that can remotely compete with the recent successes of deep neural networks (DNN). For satellite imagery, the nearly constant zenith view angle along with the smaller object size (in pixels) compared to cellphone or personal camera images simplifies many computer vision tasks. For example, consider building a machine learning system to recognize automobiles. Machine learning using images from traffic cameras requires creating a model to recognize an automobile from a continuous interval of angles (front, side, etc.); neural networks excel at building such models via learned features. Machine learning on satellite images, however, only requires recognizing the overhead silhouette of a car, assuming a constant nadir stare angle. Furthermore, objects of interest in satellite images are often only a few pixels in size and many features typically used to identify an object may be highly blurred, leaving only object outlines as differentiable. HOG descriptors capture such outline information, and are simpler, less powerful, and faster (~20x) alternatives to neural networks. In addition, HOG features can be extracted via the CPUs of a laptop or computing cluster, and need not rely on high performance graphical processing units (GPU) that may not be available to all users.

In brief, a HOG descriptor is computed by calculating image gradients that capture contour and silhouette information of grayscale images. Gradient information is pooled into a 1-D histogram of orientations, thereby transforming a 2-D image into a much smaller 1-D vector that forms the input for machine learning algorithms such as random forests, support vector machines, or logistic regression classifiers.

Figure 1. Sample satellite image cutouts of boat(s) (left), and image of HOG descriptor (right) (Imagery Courtesy of Digital Globe).
Figure 2. Sample satellite image cutouts of background regions (left), and image of HOG descriptor (right) (Imagery Courtesy of Digital Globe).

2. Training Data

We use the training data described in the previous post, with 0.34m and 0.5m ground sample distance (GSD) DigitalGlobe boat training cutouts augmented by reducing the resolution of the original training images via a Gaussian blur with equivalent GSD of 1.0m. This re-projecting changes the final accuracy numbers by only 1–5%, although it does improve the robustness for small objects in the validation phase.

For each of the 73 image sets (based on 5 degree rotations about the unit circle, plus background) we compute the HOG descriptor of each image. A robust background estimate is crucial, so we augment the negative training data by rotations and mirroring, increasing the dataset size 6480 to 25,920 images. This yields a total of 116,640 positive HOGs and 25,920 negative HOGs. We apply principal component analysis (PCA) to this data corpus and retain only the 200 primary dimensions. This step greatly reduces the dataset size and correspondingly increases training and testing speed, while decreasing accuracy by less than 1%. We split the data into a training set (75%) and test set (25%), and use the training data to train a logistic regression (LogReg) classifier for 73 categories (72 boat headings + null). Other classifiers such as support vector machines (SVM) could be used; for this use case we find no accuracy advantage with SVMs, though much slower execution. Compared to the hours or days needed to train neural networks, training this model is extremely rapid: extracting all positive and negative HOGs takes 241 seconds and training the classifier takes 58 seconds on one CPU, for a total of ~6 minutes.

3. Classification Results

Figure 3. Classification results on the 25% test set. We define accuracy as a +/- 5 degrees of ground truth. With this metric, our accuracy is 87%. The false positive rate (non-null result in a null image) is 0.1% and the false negative rate (null result predicted in a non-null image) is 0.1% (Imagery Courtesy of DigitalGlobe).
Figure 4. Classification results demonstrating correct classification (left), false negative (middle), and 180 degree misclassification (right) (Imagery Courtesy of DigitalGlobe).
Figure 5. Classification results showing examples of false positives classified as aligned at 240 degrees (left) and 180 degrees (right) (Imagery Courtesy of DigitalGlobe).

4. Confidence Estimates

Reporting best-fit parameters is of little practical use without attendant error bars. Knowledge of the expected range of accuracies for a given workflow is of far greater utility in functional applications than what is typically reported: the highest accuracy achieved from a series of controlled experiments and manual hyperparameter tuning. In an attempt to address both systematic and statistical errors, we craft multiple classifiers to estimate classifier robustness, and utilize bootstrap resampling to estimate validation dataset confidence levels.

5. Test Set Confidence Estimates

The speed advantage of HOG+PCA+LogReg classifier training over neural network approaches allows us to estimate the classifier accuracy error by building multiple classifiers. Accordingly, we create a corpus of 30 different randomly initialized train-test splits of our HOG vectors; we then train a different logistic regression classifier on each unique split of the training data. This approach provides an indication of how sensitive the classifier is to the input data. An extension of this approach would include a hyper-parameter grid search over all classifier parameters; in this example we need not apply a grid search since we resist the urge to manually tune and instead use the default hyper-parameters for all classifiers. The multiple classifiers allow us to build up a distribution of the quantities of interest, and thereby use that distribution to estimate error levels. In Table 1 we report the mean and standard deviation (STD) of each measure on the 25% portion of our input data that we reserve for testing.

Table 1. Errors for 30 logistic regression classifiers trained on different train/test splits. The very low standard deviation for all measures demonstrates that the HOG+PCA+LogReg classifier is robust to the precise train/test split.

6. Validation Set Confidence Estimates

In order to validate the classifier, we apply the classifier to 516 boat cutouts from a separate DigitalGlobe (DG) validation image (see the datasets post, Section 4). We also extract 516 null cutouts to augment the positive data and test the false positive rate. Utilizing a separate image is important because our algorithms may overtrain on the exact parameters of the test images such as lighting, contrast, sun angle, cloud levels, pollution levels, etc.. Validating the algorithm on a separate image will indicate if it is robust to such systematic errors. For validation errors we apply the 30 different HOG+PCA+LogReg classifiers described above, and also use bootstrap resampling of the validation dataset. We use 1000 bootstrap resamples, which combined with the multiple classifiers provides a total of 30,000 samples of 1032 images each; evaluating this dataset takes 217 seconds, or a mere 8 microseconds per image.

Table 2. Accuracy levels and bootstrap errors for an independent validation imagery corpus. These levels are within 10% of the levels reported on the test set.

7. Conclusions

HOG-based classifiers lack the power of neural network classifiers, and so may break down in crowded or complex scenes. There are a number of advantages to HOG-based classifiers, however. HOG features can be extracted via the CPUs of a laptop or computing cluster, so GPUs are not necessary. HOG-based classifiers are also extremely fast to train and evaluate, which enables confidence level estimation via training multiple classifiers combined with bootstrap resampling.

In this post we detailed efforts to classify boat headings into 5-degree bins. Using a HOG+PCA+LogReg classifier, we achieved a 79.1 +/- 1.9% accuracy rate, and very low false positive and false negative rates. In subsequent posts we will further explore how these values depend on image resolution and training set size.

--

--