Spatial resolution & object detection accuracy
At rexy.ai we work a lot on object detection on satellite and UAV images. Our platform is built to cover entire workflow of a typical “AI” project in geospatial: annotate GeoTiffs, train computer vision models and integrate AI model or model results with a business application. But the first step is really to identify right data source to solve the problem.
While working with different clients we often get a question — what image provider should they use? What spatial resolution do they need? Obviously better data leads to more accurate results. But at what cost?
Multi-spectral images at 1 m/pixel resolution might cost about $2 per sq.km. and 30 cm/pixel can easily reach $15-20 per sq.km. Accuracy-cost tradeoff is not always obvious and we encourage our clients to experiment with different sources before making a final commitment. Thanks to our platform it is a trivial exercise.
Enough theory! We will experiment with RarePlanes dataset which consists of Maxar WorldView-3 images and should have 31 cm/pixel resolution. RarePlanes labels consist of 7 types of airplanes and there are 187 GeoTiffs in the train dataset and 66 GeoTiffs in the test dataset.
First of all let’s explore resolution of the images:
We’ve expected pixel size across images should be close to this 30 cm/pixel (as per WorldView-3 spec). But it’s not the case and we see some images have 15 cm/pixel resolution!
Pixel size & spatial resolution
Most of the people use “pixel size” and “spatial resolution” interchangeably, but they might be quite different (we believe in our case due to resampling). We encourage reader to check this post to learn about the [significant] nuances.
Nevertheless users dealing with GeoTiff images typically have access only to the pixel size. So we will explore how pixel size affects detection accuracy.
Experiment set up
We will create 3 datasets with pixel size = 30, 50 and 100 cm:
At rexy.ai we’ve developed a highly accurate object detector framework which we call “YOLOO”. In this post we won’t deep dive into internals. In a nutshell it is built on YOLO ideas and tailored for remote sensing tasks:
- Robustly detects small objects along with their orientation (hence “O” in the end which stands for “oriented”)
- Generalizes well from small number of training samples
- Fast to train, fast to deploy, fast to run
Once datasets are created it is trivial to train and validate models:
So we train three models on 0.3, 0.5 and 1.0 m/pixel datasets with the same hyperparameters. It takes only about 15 minutes to achieve convergence (training longer will allow to squeeze a bit more):
Interestingly models trained on 30 cm/pixel and 50 cm/pixel don’t show huge difference (Average Precision ~0.82). And the model trained on 1 m/pixel demonstrates noticeable drop in accuracy (Average Precision ~0.65).
A more detailed look gives another insight — accuracy doesn’t drop much for Large/Medium airplanes (0.89 -> 0.81), but it does for Small planes (0.83 -> 0.5):
Cost-accuracy tradeoff is not always obvious (or linear!). We suggest to not rely on the intuition, but run experiments to prove or disprove hypothesis.
In remote sensing sometimes it’s enough to use images from Satellogic, but sometimes you really need Maxar and Pleiades Neo. And the cost of the images might vary 10x. With the help of our self-service platform it’s easy to explore tradeoffs— the whole experiment took about 20 minutes and zero lines of code.
Still confused? Feel free to drop a message at email@example.com.
PS TensorBoard logs are available on our here.