Giving SIMRDWN a Spin, Part II

Applying pre-trained models to localize cars in large overhead images

Rapidly locating small objects in overhead imagery remains a challenging task, though one of great interest to myriad applications (e.g. [1], [2], [3]). In a previous post we demonstrated how to train deep learning object detection models to find cars in overhead imagery. In this post we will show how to apply these trained models to large test images, and visualize results. We also discuss some of the new features of the recently released SIMRDWN v2, such as incorporation of the latest TensorFlow models and YOLO v3.

1. Model Training

For the purposes of this post, we assume that the reader has already trained a model on the COWC dataset. For a detailed description of training a bespoke model on overhead imagery, see our previous post. This post looks at only a single object class (cars), but the code can easily be adapted to include an arbitrary number object classes (e.g. [1]). In brief, once data is prepared with the /simrdwn/core/prep_data_cowc.py script, training can be invoked with the following commands:

# SSD COWC training
python /simrdwn/core/simrdwn.py \
--framework ssd \
--mode train \
--outname inception_v2_cowc \
--label_map_path /simrdwn/data/class_labels_car.pbtxt \
--tf_cfg_train_file /simrdwn/configs/ssd_inception_v2_simrdwn.config \
--train_tf_record /local_data/simrdwn/training_datasets/cowc/cowc_train.tfrecord \
--train_input_width 544 --train_input_height 544 \
--max_batches 30000 \
--batch_size 64
# Faster R-CNN COWC training
python /simrdwn/core/simrdwn.py \
--framework faster_rcnn \
--mode train \
--outname resnet101_cowc \
--label_map_path /simrdwn/data/class_labels_car.pbtxt \
--tf_cfg_train_file /simrdwn/configs/faster_rcnn_resnet101_simrdwn.config \
--train_tf_record /local_data/simrdwn/training_datasets/cowc/cowc_train.tfrecord \
--train_input_width 544 --train_input_height 544 \
--max_batches 30000 \
--batch_size 64
# YOLT2 COWC training
python /simrdwn/core/simrdwn.py \
--framework yolt2 \
--mode train \
--outname cowc \
--yolt_cfg_file yolt.cfg \
--weight_dir /simrdwn/yolt2/input_weights \
--train_input_width 544 --train_input_height 544 \
--weight_file yolo.weights \
--yolt_train_images_list_file cowc_yolt_train_list.txt \
--label_map_path /simrdwn/data/class_labels_car.pbtxt \
--max_batches 30000 \
--batch_size 64 \
--subdivisions 16
# YOLT3 COWC training
python /simrdwn/core/simrdwn.py \
--framework yolt3 \
--mode train \
--outname cowc \
--yolt_cfg_file yolov3_544.cfg \
--train_input_width 544 --train_input_height 544 \
--boxes_per_grid 9 \
--weight_dir /simrdwn/yolt3/input_weights \
--weight_file yolov3.weights \
--train_input_width 544 --train_input_height 544 \
--yolt_train_images_list_file cowc_yolt_train_list.txt \
--label_map_path /simrdwn/data/class_labels_car.pbtxt \
--max_batches 30000 \
--batch_size 64 \
--subdivisions 16

Training will take one to two days per model on a NVIDIA Titan Xp GPU. The .pbtxt files list out the class labels for the model, and should be 1-indexed. For example, class_labels_car.pbtxt looks like:

item {
id: 1
name: 'car'
}

2. Testing

We test our pre-trained models on the Utah portion of the COWC dataset. Recall that the data can be found here. We’ll put the Utah data in its own directory: /local_data/simrdwn/test_images/cowc/Utah_AGRC. This test set comprises 9 images at 15 cm resolution covering 19.9 square kilometers. SIMRDWN can run inference on the fly on arbitrary image sizes, but one can also preprocess test imagery if the same imagery will be analyzed multiple times. This is achieved via the following command:

# Preprocess test imagery (optional)
python /simrdwn/core/simrdwn.py \
--framework '' \
--mode test \
--outname data_slices_utah_cowc \
--testims_dir /local_data/simrdwn/test_images/cowc/Utah_AGRC \
--slice_sizes_str 544 \
--slice_overlap 0.1 \
--test_slice_sep __ \
--test_prep_only 1

What the above command is doing is slicing the images into 544 x 544 pixel chunks, building a training list for YOLT, and creating a .tfrecord for TensorFlow. Now let’s run inference on the imagery.

# Faster R-CNN
python /simrdwn/core/simrdwn.py \
--framework faster_rcnn \
--mode test \
--outname cowc_15cm \
--train_model_path train_faster_rcnn_resnet101_cowc_2019_04_17_02-54-58 \
--testims_dir /local_data/simrdwn/test_images/cowc/Utah_AGRC \
--test_presliced_tfrecord_part results/test__data_slices_utah_cowc_2019_04_17_01-17-09 \
--label_map_path /simrdwn/data/class_labels_car.pbtxt \
--tf_cfg_train_file faster_rcnn_resnet101_simrdwn.config \
--use_tfrecords 1 \
--overwrite_inference_graph 1 \
--edge_buffer_test 1 \
--n_test_output_plots 4 \
--plot_thresh_str 0.2 \
--batch_size 4 \
--min_retain_prob 0.1 \
--show_labels 1 \
--alpha_scaling 1

There are a number of options in the function call, many of which are paths, but some of which determine the output plots.

  • outname = base name of output folder in results/ directory
  • label_map_path = location of class labels
  • train_model_path = location of trained model
  • use_tfrecords = switch to use .tfrecords (versus YOLT)
  • edge_buffer_test = a buffer to use when stitching together images (a value of 1 is usually sufficient)
  • plot_thresh_str = confidence threshold for the output plots
  • show_labels = switch to display class labels atop boxes (e.g. ‘car’)
  • alpha_scaling = switch to scale box opacity with confidence
  • n_test_output_plots = number of images to plot on output
  • min_retrain_prob = minimum confidence to retain in inference outputs

This code will output predicted boxes and plots of 4 of the test images into the /simrdwn/results/test_faster_rcnn_cowc_15cm_[date] directory, where [date] is the timestamp of the execution command. Running Faster R-CNN on our test set of set of takes 393 seconds.

Figure 1. Zoom of the output of Faster R-CNN model as applied to the Utah test set. Note that there are a number of false positives as well as false negatives (largely from black cars).
# SSD
python /simrdwn2/core/simrdwn.py \
--framework ssd \
--mode test \
--outname cowc_15cm \
--train_model_path train_ssd_inception_v2_cowc_orig_cfg_2019_04_18_14-47-49 \
--testims_dir /local_data/simrdwn2/test_images/cowc/Utah_AGRC \
--test_presliced_tfrecord_part results/test__data_slices_utah_cowc_2019_04_17_01-17-09 \
--label_map_path /simrdwn2/data/class_labels_car.pbtxt \
--tf_cfg_train_file ssd_inception_v2_simrdwn_orig.config \
--use_tfrecords 1 \
--overwrite_inference_graph 1 \
--edge_buffer_test 1 \
--n_test_output_plots 4 \
--plot_thresh_str 0.2 \
--batch_size 4 \
--min_retain_prob 0.1 \
--show_labels 1 \
--alpha_scaling 1

SSD inference is markedly faster at 143 seconds for the entire test set, and appears to have better performance as well (see Figure 2).

Figure 2. Zoom of SSD inference. The majority of cars are captured, though densely clustered car are still a challenge

We could of course keep testing TensorFlow models, as all models within the TensorFlow Object Detection API are accessible to SIMRDWN (there are 49 sample config files as of mid-April 2019). Instead, we will now explore YOLT results. The latest version of SIMRDWN includes both YOLT2 (based on YOLO v2) and YOLT3 (based on YOLO v3). YOLT confidences tend to be lower than TensorFlow models, so we reduct the plot threshold (plot_thresh_str) to 0.15. Inference can be run with:

# YOLT2
python /simrdwn2/core/simrdwn.py \
--framework yolt2 \
--mode test \
--outname cowc_15cm \
--label_map_path /simrdwn2/data/class_labels_car.pbtxt \
--train_model_path train_yolt2_dense_cowc_2019_04_16_12-06-38 \
--testims_dir /local_data/simrdwn2/test_images/cowc/Utah_AGRC \
--test_presliced_list test__data_slices_utah_cowc_2019_04_17_01-17-09/test_splitims_input_files.txt \
--weight_file ave_dense_544_final.weights \
--yolt_cfg_file ave_dense_544.cfg \
--n_test_output_plots 4 \
--edge_buffer_test 1 \
--plot_thresh_str 0.15 \
--batch_size 4 \
--min_retain_prob 0.03 \
--show_labels 1 \
--alpha_scaling 1

# YOLT3
python /simrdwn2/core/simrdwn.py \
--framework yolt3 \
--mode test \
--outname cowc_15cm \
--label_map_path /simrdwn2/data/class_labels_car.pbtxt \
--train_model_path train_yolt3_dense_cowc_2019_04_12_19-37-06 \
--testims_dir /local_data/simrdwn2/test_images/cowc/Utah_AGRC \
--test_presliced_list test__data_slices_utah_cowc_2019_04_17_01-17-09/test_splitims_input_files.txt \
--weight_file yolov3_544_final.weights \
--yolt_cfg_file yolov3_544.cfg \
--boxes_per_grid 9 \
--n_test_output_plots 4 \
--edge_buffer_test 1 \
--plot_thresh_str 0.15 \
--batch_size 4 \
--min_retain_prob 0.03 \
--show_labels 1 \
--alpha_scaling 1
  • yolt_cfg_file = Configuration file used to to train YOLT2/3
  • boxes_per_grid = Defaults to 5 for YOLT2, and needs to be 9 for YOLT3

Detection time is comparable, at 112 seconds for YOLT2, and 108 seconds for YOLT3. Figure 3 displays outputs of both YOLT2 and YOLT3.

Figure 3. Zoom of YOLT2 (top) and YOLT3 (bottom) test outputs.

Inspection of Figures 1–3 indicate that black cars are difficult for all models, an issue that could likely be alleviated with data augmentation and lengthier training. The YOLT models are significantly faster than SSD or Faster R-CNN, with slightly better performance as well (see our arXiv paper for rigorous metrics). The Appendix displays the same region for all models to allow a visual comparison.

3. Conclusions

In this post, we demonstrated how to run inference on overhead images of arbitrary size with the SIMRDWN framework. We showed that the YOLT2/3 models are somewhat faster than Faster R-CNN and SSD. In subsequent posts we will expand beyond visual outputs and show how to compute evaluation metrics for these models, as well as models trained on a diverse set of objects.

Appendix: Model Comparisons

The following images compare performance from Faster R-CNN, SSD, YOLT2, and YOLT3 over the same test region in Utah.

Faster R-CNN
SSD
YOLT2
YOLT3