Announcing Solaris: an open source Python library for analyzing overhead imagery with machine learning

Nick Weir
The DownLinQ
Published in
5 min readJul 23, 2019

Performing machine learning (ML) and analyzing geospatial data are both hard problems requiring a lot of domain expertise. These limitations have historically meant that one needs to be an expert in both to perform even the most basic analyses, making advances in AI for overhead imagery difficult to achieve. We at CosmiQ Works have asked ourselves: is there anything we can do to reduce this barrier to entry, making it easier to apply machine learning methods to overhead imagery data?

Enter Solaris, a new Python library for ML analysis of geospatial data from CosmiQ Works. Solaris builds upon SpaceNet’s previous tool suite, SpaceNetUtilities, along with several other CosmiQ projects like BASISS to provide an end-to-end pipeline for geospatial AI. Solaris provides well-documented Python APIs and simple command line tools to complete every step of a geospatial ML pipeline with ease:

  1. Tile raw imagery and vector labels into pieces compatible with ML
  2. Convert vector labels to ML-compatible pixel masks
  3. Train state-of-the-art deep learning models with three lines of Python code
  4. Segment objects of interest with machine learning models (including the SpaceNet winners’ models, with pre-trained weights and configs provided!)
  5. Georegister predictions and convert them to standardized geospatial data formats
  6. Score model performance against hand-labeled ground truth using the SpaceNet datasets

…and more!

Would you prefer a basic command line interface so you can run a pre-trained model without learning Python? Solaris has that, with tutorials to help you get started: Create pixel masks from vector labels, train a deep learning algorithm, or predict where buildings are using a pre-trained SpaceNet Challenge-winning model, all without writing a single line of Python code.

Are you a Python developer who doesn’t want to write the tedious pre- and post-processing functions to feed your geospatial data into a deep learning model? The thoroughly documented Solaris Python API as well as Python API usage tutorials are ready to help. Furthermore, the codebase is entirely open source and we welcome any feature requests, bug reports, and tutorial suggestions from users.

Here’s a peek at just a few of the exciting features of Solaris:

Pre-trained SpaceNet Challenge-winning models

CosmiQ Works and the SpaceNet LLC have long provided competitors’ model code after the completion of SpaceNet Challenges; however, each competitor has their own coding environment, analysis pipeline, and codebase that must be understood and implemented to try out their models. We’ve removed this barrier by re-training SpaceNet prize-winning models using Solaris and providing the competitors’ model weights for free on Amazon AWS S3 (thanks to the generous support of AWS!) The Solaris documentation provides tutorials on how to use those models, either from the command line or using the Python API. There is also documentation of the pre-trained models with links to model weights and pre-made configuration files. You can use the existing pre-trained model for prediction, fine-tune those models on new satellite images, or re-train them from scratch. We currently provide four different prize-winning models from the SpaceNet 4: Off-Nadir Building Footprint Extraction Challenge, with many more coming soon:

  • A ResNet34-UNet, used by selim_sef;
  • A Densenet121-UNet, used by selim_sef;
  • A Densenet161-UNet, used by selim_sef;
  • A VGG16-UNet, used by XD_XD.

This allows you to quickly compare building footprints generated by different models, as we do in the example below:

Comparison between predictions generated using four different models on the same image from the SpaceNet MVOI Atlanta dataset. Original image courtesy of Maxar Technologies. Manually created labels are blue with red outlines, and model predictions are green with red outlines.

It’s worth noting that each of those prediction sets were generated with only five lines of Python code:

import solaris as solconfig = sol.utils.config.parse('/path/to/config/file.yml')
inferer = sol.nets.infer.Inferer(config)
inference_data = sol.nets.infer.get_infer_df(config)
inferer(inference_data)

See the documentation on the pretrained models to get the relevant configuration files.

You can also develop your own model architectures or provide your own pre-trained models to your deep learning pipeline using Solaris. Solaris can handle models written in either PyTorch or TensorFlow.

Create training masks from geo-registered labels

We’ve provided documented, standardized code to automatically convert your georegistered labels into pixel masks for deep learning. Not only can you create single-channel masks labeling building footprints or road networks using Solaris, but you can create multi-channel masks with object outlines, as well as contact points between closely juxtaposed objects:

A pixel mask for model training created by Solaris. Red denotes building footprints, green is building outlines, and blue are areas between two nearby footprints.

Check out our Python API tutorial or mask creation CLI tutorial for more.

State-of-the-art loss functions, combination losses, optimizers, and more

CosmiQ Works and the SpaceNet competitors regularly use specialized loss functions to train deep learning models that can overcome some of the unique challenges associated with overhead imagery. Implementations of these losses can be hard to find, particularly when multiple loss functions need to be combined. State-of-the-art optimizers are also necessary in some cases. We’ve provided code implementing losses like Focal Loss, Lovasz Hinge Loss, and optimizers like AdamW to help users train cutting-edge algorithms.

Evaluate models using SpaceNet data and metrics

We often hear that it’s hard to compare between deep learning models for geospatial applications because they’re trained and scored on different datasets and/or they use metrics that don’t accurately reflect model performance on the desired task. To address this for building footprint detection, Solaris provides functions to evaluate model performance using the SpaceNet metric which assesses the quality of entire building footprints, not just pixel masks. Paired with the open source SpaceNet datasets and model training/inferencing using Solaris, these metrics allow direct, apples-to-apples comparison between models trained and inferenced on the exact same data using the exact same metrics and Python code. Soon, we’ll be adding road network quality scoring using the APLS metric to aid participants in SpaceNet 5.

…and more!

These are only a few of the things that you can do with Solaris. Explore the Solaris documentation and the Solaris GitHub repository for more. We’ll also be adding more functionality to Solaris, so keep an eye out for updates as we continue to grow the project. Finally, a sneak preview — Jake Shermeyer will be following up shortly with a tutorial on how to use Solaris and SpaceNet models to identify cars in the COWC dataset:

Cars identified in the COWC dataset. Red boxes marking individual cars are extracted from segmentation masks. Keep an eye out for the upcoming post for code and a detailed explainer!

Follow us at this blog and on Twitter for tutorials, step-by-step guides, and to learn more about our other projects!

--

--

Nick Weir
The DownLinQ

Data Scientist at CosmiQ Works and SpaceNet 4 Challenge Director at the SpaceNet LLC. Advancing computer vision and ML analysis of geospatial imagery.