Few Shot Geospatial Deep Learning — Part 1

Karthik Dutt
GeoAI
Published in
6 min readAug 16, 2022

Deep learning models have always been data hungry. They need a lot of training data to learn from, and that data is often manually generated. Although there is a large amount of geospatial imagery that is available for training, we also need the imagery to be labelled. The effort that it takes to create these labels by digitizing geographical features or classifying land cover manually is an important factor that often discourages individuals and organizations from adopting geospatial deep learning. This is where the ability to train geospatial deep learning models in low data regimes, also known as few-shot learning, comes in handy.

Geospatial deep learning models for imagery use Convolution Neural Networks and these networks analyze every image at the level of individual pixels to learn to either classify it to a particular class or detect objects in it. Although, the state-of-the-art models today have gotten better than humans in carrying out these tasks, it is important to recall that humans don’t need to see thousands of images of cats and dogs to learn to classify a new image as either a cat or a dog. A child is able to look at just a couple of images of dogs and cats and is able to recognize a new image as that of a cat or dog by looking at the similarity between the new image and the images shown earlier.

This concept is used by ‘Few-Shot learners’. These learners do not learn the characteristics of the object itself but learn to recognize similarity between two objects. This concept of ‘learning to learn’ is called Meta Learning.

When a model has learnt to identify similarities between two images, it can apply this knowledge to classify or detect newer objects without having to look at thousands of images of that object.

To summarize, Few Shot learners are able to learn about new classes or objects, given only a few examples of these.

On the left, we see the results from an object detection model that used 20000 labelled image chips during its training phase and on the right, we see the predictions from a few shot detection model that was trained on just 100 labelled chips. There is hardly any difference between the performance of the two models!

Few shot learning — Deep dive

Some of the key terms that we will use in few shot learning are Base training dataset, Support dataset and Query Image. Let us have a look at each of these terms.

Let us say, we are interested in classifying satellite images as those of a desert, beach or a forest. And to get a good model in place, we just have 1 labelled image of each class from which the model should learn. This set of training images is called the Support dataset.

Once we have a trained model, we want to get the prediction on a new image, shown below. This image is called the ‘Query’ image.

Few-shot learning is based on the concept of meta learning where the model first ‘learns to learn’ or understand the similarity between two images. In order to perform this task, we can train the model on a small subset of the Imagenet dataset, having 100 classes and 600 images of each class. This dataset is called the “Base training dataset”. Note that the new classes that we want the model to classify are not a part of the base training dataset.

Having understood the key terms, let us now look at the steps involved in Few-shot learning.

In the first step, we pretrain a model using our abundantly available and labelled base training dataset, which does not contain our novel classes. This step is similar to what we do in traditional supervised learning, where we train the model to classify images into the classes already seen in the training dataset. However, in few-shot learning, the objective of this pre-training step is not to classify images into the categories seen during training, but to learn to identify similarity between two images. At the end of the pretraining phase, we will have a base model which would have learnt to quantify the similarity between two images, not necessarily belonging to classes used in the base dataset.

The next step is to finetune the base model using the limited labelled images from the support dataset.

Image by Wei-Yu Chen, et al.

The training stage trains a feature extractor fθ and classifier C(.|Wb) with the base training dataset. In the fine-tuning stage, we fix the network parameters θ in the feature extractor fθ and train a new classifier C(.|Wn) with the given labeled examples in novel classes

The performance of a few shot learner depends on a couple of factors:

1. The number of classes that we have in our support dataset. Lesser the number of classes, the better.

2. The number of samples of each class in our support dataset. The more the number of samples, the better.

In case our support dataset is made up of images of three classes (say, desert, forest and beach) and the dataset just has 1 image belonging to each class, we term the model finetuned on this dataset as 3 way 1 shot learning model.

If the support dataset had two images of each class, the generated model would be a result of 3 way 2 shot learning.

Some state of the art Few-shot classifiers that uses the meta learning concept are

1. MatchingNet

2. ProtoNet

3. RelationNet

For both MatchingNet and ProtoNet, the prediction on examples in a query set is based on comparing the distance between the query feature vector and the support feature vector from each class. MatchingNet compares cosine distance between the query feature and each support feature, and computes average cosine distance for each class, while ProtoNet compares the Euclidean distance between query features and the class mean of support features.

RelationNet shares a similar idea, but it replaces distance with a learnable relation module.

Some state of the art Few-shot detectors that uses the meta learning concept are:

1. MetaDet

2. Meta-YOLO

3. Meta-RCNN

There are more advanced few-shot detectors which uses self-supervised backbones. We will discuss more about self-supervised learning and few-shot detectors that use this technique in the next blog in this series.

--

--