Basics of Deep Learning in Earth Observation

In this article I’m going to briefly summarise some things I’ve learnt recently about the ever-so-hyped field of Machine Learning and how it can and is being applied to Earth Observation.

What I’ve found is that there are three main categories of remote sensing computer vision problems. From the easiest to hardest they are:

  1. Image Classification
  2. Object Detection
  3. Semantic Segmentation

The aim of Image Classification is to predict the contents of an image as a whole. If you cut a raster into small tiles, a classifier could then be trained to label individual tiles as rich or poor like with Digital Globe’s penny (below).

Object Detection involves forming a bounding box around an object in an image. It’s useful for detecting, counting and locating vessels, cars or possibly more. You can also derive a rough area calculation from the size of the bounding box. There are sometimes free data-sets to make use of, like Cars Overhead with Context.

Left, Predictions of income in New York — an example of an Image Classifier. Right, A boat detector. Credit: GBDX

Semantic Segmentation is much like 1, but each individual pixel is classified instead of the whole image. It’s used to identify building footprints, roads and rivers and it’s the basis of the SpaceNet competitions.

Output from Azavea’s model for semantic segmentation.


By being clever about it you can turn what looks like an Object Detection problem into Classification, which improves speed and accuracy and reduces the amount of time spent labelling.

Image chips cut along legal property boundaries in the UK. Models based on these definitely have some business applications!

For example, if you want to find which properties in a city have solar panels installed, then instead scanning over tens of thousands of tiles, you could create chips along property outlines using something like the INSPIRE dataset and train an image classifier on each chip. Each small chip (property) is labelled as containing or not containing solar panels.

Practical Info

Each of the above problems have their own associated models. Most of the best models around are convolutional neural networks and some useful model names are VGGNet, ResNet , Inception and U-Net.

To test some of these models out I used Tensorflow for image classification and object detection on a couple of different problems with reasonable success. After only a few hundred labels you can achieve F1 scores (accuracy) of around 70–80 percent and I would hope that with a few thousand labels this would increase to the headline 90 percent or more we are used to in other applications. However, there are some limitations when it comes to the size and variation of satellite imagery.

I’ve been using QGIS for labelling/general manipulation, and FME for tiling, chipping, clipping, etc. The combination of both tools is great for rapid prototyping, but writing your own python gdal scripts would be preferable for production & for rolling out a model on a larger area.

DigitalGlobe have posted a bunch of different repos with EO specific python scripts for ML, as have SpaceNet.

I hope you found this useful or at least interesting - I plan to do a bit of a longer article next time.