GeoAI and its four horsemen

Published in

The Modern Scientist

5 min readJan 30, 2023

What are the main algorithms for overhead imagery analysis?

Geospatial artificial intelligence, or GeoAI, is an emerging scientific discipline that combines methods in spatial data science, machine and deep learning to extract knowledge from spatial big data (Janowicz et al., 2019). It is an active area of research that has applications in many fields such as: disaster management, urban planning, logistics, retail, solar, and many others (Ballesteros et al, 2021). At the same time, the rapidly increasing availability and quality of satellite and lately, drone imagery, its ease of use, and the affordable price of consumer and professional drones are making these technologies converge.

One might think that there are tons of algorithms that GeoAI applies to overhead imagery analysis, but truth is that those are summarized in only four, which are variation of a general function called Artificial Neural Network (NN). A neural network is a layered architecture formed by combination of artificial neurons. The study of NNs is called Deep Learning, they work as a mapping function, Y = F(X), between the input X and the output values.

This text briefly explains those algorithms and their applications.

1. Image classification

Object or multi object classification of an image, warns the presence or not of a specific group of object in input image, without the calculation of objects position within the image. Convolutional Neural Networks (CNN) are the default network architecture used to solve the image classification problem, these neural networks perform a scalar product between a moving kernel and the pixel values of the image. The scalar product also called convolution, allows to classify objects in image agnostic to their position, angle, or size. Many kernels or filters are used and the weights or parameters of the network are automatically calculated and re-calculated by the network using a comparison between expected and obtained values in a back propagation through the layers. First layers calculate general geometric features of objects, like diagonal and vertical lines, etc, next layers define more complex and aggregated forms, finally, output of network corresponds to a telescopic number of neurons, according to the number of objects classes. Figure 1 shows an example of this architecture [1].

Figure 1. See kernels of 6x6, 3x3 and 4x4 in the upper medium part of the image. [1]

There is an extended version of image classification, this is, when the network instead of a probability, outputs a text for naming existing objects in the image. This is called image caption.

2. Object detection or recognizition

In a similar way of the image classification, detection models find objects within the image, but their position is framed by rectangle areas. Also, the probability of existence of every specific object is mentioned.

Figure 2. Human detection dataset. Image by the author

Figure 3. Traffic signs detection dataset. Image by the author

3. Image semantic segmentation

Semantic segmentation is the classification of every pixel of an image. Pixels are separated in color-coded classes. Deep Learning models to perform this task make use of color masks as labels for objects of interest, or binary masks (black and white) in the case of only one class segmentation. Since this is a pixel level classification, it is a more computing power demanding algorithm than the previous ones. Scientists have compared different artificial neural networks architectures with the purpose of finding the best performing. The U-Net [2], initially developed for medical images, has been in the top of metric scores for semantic segmentation in the last years.

Figure 5. Multiclass semantic segmentation dataset, Drone Deploy Benchmark (www.dronedeploy.com)

Figure 6. Binary mask, Massachussets Road dataset, Mnih et al 2013.

Image generation

Image generation is the process of obtaining an image from a random space, this is, the obtention of synthetic images that plausibly resemble the characteristics of existing distribution of a group of images. Main application of image generation is synthetic data creation for model training, computer graphics, and design and many others (Isola et al., 2017). Image generation is typically performed by generative models, these are of two kinds, whether explicitly programmed or created using supervised learning. The latest uses two CNN, one that classifies input images at pixel level (a kind of semantic segmentation), and the other that qualifies the classification process of pixels made by the first one. The two networks work adversarially, and that is why they are called “Generative Adversarial Networks (GAN)”. Image generation has a bunch of applications such as: image translation, super resolution, style transfer, and image transfiguration.

Figure 7. Image to map translation. Modified from Isola et al, 2017.

Applications, fields and type of image

Figure 8 shows summarizes the statistical proportion of the four algorithms, their fields of application and the sensor or type of imagery used [3].

Conclusions

There are four main algorithms used by GeoAI to analyze overhead imagery, they are: image classification, object detection, semantic segmentation and image generation.

All of the four revisited algorithms and their architectures are based on CNN, an artificial neural network applied on pixels using convolutions which are moving windowed scalar products.

Most of the papers in GeoAI are for object detection (~54%), then segmentation (~41%), image classification (~5%), and finally image generation (with less than 1%).

Main fields of application are environment (46%), urban (27%) and agricultrure (26%), others (1%).

The majority of used images for analysis are RGB (52%), Multispectral (24%), Hyperspectral (18%), and Lidar (6%).

References

Imagenet classification with deep convolutional neural networks[C] Krizhevsky A, Sutskever I, Hinton G E., Advances in neural information processing systems (NIPS 2012),: 1097–1105, 2012.
U-Net: Convolutional Networks for Biomedical Image Segmentation. Olaf Ronneberger, Philipp Fischer, Thomas Brox. Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, LNCS, Vol.9351: 234–241, 2015
Abdollahi et al. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review. 2020

Support me

Enjoying my work? Show your support with Buy me a coffee, a simple way for you to encourage me and others to write. If you feel like it, just click the next link and I will enjoy a cup of coffee!