Accelerating your geospatial deep learning pipeline with fine-tuning
Using CosmiQ’s Solaris library to fine-tune models pre-trained on SpaceNet overhead imagery
Summary: Using Solaris, you can fine-tune deep learning models pre-trained on overhead imagery for five minutes and achieve performance comparable to past SpaceNet Challenge prize-winners.
The 5th round of the popular SpaceNet Challenge series has begun! There, participants are challenged to develop models that can extract routable road networks with travel time predictions from overhead imagery. Past iterations of the challenge have been dominated by deep learning models painstakingly trained for days, often using multiple GPUs. The winner in the most recent SpaceNet challenge ensembled 28 independently trained models to produce their solution, which entailed ~650 GPU-hours of training on commercial-grade GPUs — making it hard for participants with limited computing resources to be competitive. We at CosmiQ Works have frequently asked ourselves: how can we make the SpaceNet Challenges more accessible to individuals who may lack these resources?
Part of the problem here is the paucity of pre-trained models for overhead imagery analysis. Many models exist for “natural scene imagery” (think ImageNet), but few open source pre-trained models are publicly available for segmentation of overhead imagery. This gap existed for a reason: there aren’t many python libraries that can ingest overhead imagery and perform deep learning analysis in a standardized fashion, meaning that pre-trained weights were hard to use and therefore provided limited value. This has encouraged every SpaceNet Challenge participant to train their models from scratch (or from ImageNet weights), which requires a lot of GPU time.
Enter Solaris, a deep learning library from CosmiQ Works. If you’re unfamiliar with Solaris, see our announcement post and the documentation page. One of the key things that Solaris provides is a suite of pre-trained models from the SpaceNet Off-Nadir Building Footprint Extraction Challenge prize winners. We asked ourselves: Can models be fine-tuned on imagery of cities that they have never seen before? Is this process more efficient than training models from scratch or ImageNet weights, as was done previously? The answer to both of these questions, as you’ll see below, is a resounding yes. The rest of this post will explore fine-tuning with Solaris in more detail. If you want to go through the fine-tuning process yourself, see the tutorial available on GitHub.
For this trial run, we took the Solaris version of XD_XD’s 5th place model from SpaceNet 4 (a U-Net with a VGG16 encoder for the aficionados), originally trained to identify buildings in imagery of Atlanta, and fine-tuned it to perform the same task on imagery of Khartoum, Sudan. We first asked how well this model worked on imagery of Sudan without any fine-tuning:
As you can see, the model trained to find buildings in imagery of Atlanta really couldn’t perform the same task in imagery of Khartoum. This is common in deep learning models for computer vision — they can’t perform well on imagery that’s very different from anything they’ve ever seen before, a task termed “generalization”.
We next asked if we could fine-tune the model — re-train it at a lower learning rate on new data for just a few epochs — and improve building footprint extraction quality in imagery of Khartoum. We did just that, reducing the learning rate 10-fold below what was used originally for training, and trained for three epochs on the SpaceNet 2 Khartoum dataset. We note that we didn’t freeze any layers’ weights before beginning.* This fine-tuning took 5 minutes on an AWS p3.2xlarge instance, corresponding to $0.26 of AWS compute expenses.
Next, we asked how well this fine-tuned model performed compared to the original model, as well as compared to the top models from the SpaceNet 2 Challenge. The results were truly striking.
Three epochs of fine-tuning — 5 minutes on an AWS p3.2xlarge instance — yielded performance comparable to the SpaceNet Challenge Round 2 prize-winning models, which were trained for up to 650 GPU-hours on Titan Xp GPUs! This is more than a 2000-fold reduction in training cost. These results highlight the value of model fine-tuning, as well as the importance of providing a standardized method to use models pre-trained for relevant tasks.
As always, SpaceNet welcomes participants who wish to train models from scratch in SpaceNet 5: Automated Road Network Extraction, Routing, and Travel Time Estimation from Satellite Imagery. However, we hope that the availability of pre-trained models as well as the fine-tuning tutorial will enable participants with fewer computing resources to be competitive in the challenge. This approach may also enable experts to focus more on aspects of the challenge separate from model training, such as converting their segmentation outputs to routable road network graphs.
Good luck in SpaceNet 5!
*Some deep learning practitioners may disagree with our use of the term fine-tuning here, as this often describes a process where weights are frozen in earlier layers of a model and reset for later layers before training on new data. We understand, but we’ve seen fine-tuning used to describe a variety of re-training processes. That specific definition is most common for whole-image classification tasks, which we’re not doing here.