Raster Vision: A Geospatial Deep Learning Framework
Authors: Adeel Hassan and Lewis Fishgold
The problem
A satellite image is more than its pixels — it is also its location. Typically encoded as a GeoTiff, such an image will also have georeferencing metadata — such as coordinates, a coordinate system, and a projection transform — that defines a mapping from pixel-based coordinates (ie. row and column indices) to positions on the Earth’s surface (ie. latitude and longitude). The same holds true for any annotations we might create for such an image — these might take the form of GeoJSON files with vector annotations (eg. polygons), or be GeoTiffs themselves. With the right tools, we can extract from these files a correctly transformed raster image and a corresponding label that we can happily feed into our computer vision models; but ultimately, to be useful in the real world, any insights gained from these models must also be mapped back to geographical locations. What use is detecting a wildfire if we don’t know where it is?
The differences between standard computer vision datasets and remote sensing datasets do not end here. Another complication is that these images tend to be too large to feed directly into a neural network and must first be broken up into smaller “chips”. You can see even more differences in the table below.
Handling these differences is not a trivial matter and often acts as a barrier to entry for both computer vision researchers wishing to make an impact in the field of remote sensing, and conversely, domain experts who are new to deep learning. But what if all of that was taken care of behind the scenes and everything just worked?
Enter Raster Vision — an open-source computer vision framework developed by Azavea.
Azavea is a geospatial software design and development company based in Philadelphia. As a certified B Corporation, our mission is to apply geospatial technology for positive civic, social, and environmental impact and to advance the state-of-the-art through research.
Raster Vision to the rescue
Raster Vision knows how to handle geospatial data and will do it for you. It will rasterize and vectorize, download and upload, analyze and normalize, chip and clip, concat and extract, and, generally speaking, do whatever it takes to ensure that the data arrives in the right shape at the right spot. Under the hood, Raster Vision makes extensive use of GDAL, Rasterio, Shapely, and, of course, NumPy to accomplish this. The figures below show some of Raster Vision’s extraordinary data processing powers.
Raster Vision can also train deep learning models. It offers fully implemented training pipelines for the computer vision tasks of chip classification, object detection, and semantic segmentation right out of the box. The models, loss functions, and optimizers are based on PyTorch and TorchVision and are highly configurable. Originally, Raster Vision used Tensorflow, but we switched to PyTorch because it made it easier to implement and debug custom models and loss functions. It also simplified the codebase by providing a single standard library covering the three different computer vision tasks.
Zooming out, we can summarize Raster Vision as a framework that enables developers to quickly and repeatably configure pipelines that go through the core components of a machine learning workflow: analyzing and pre-processing training data, training models, creating predictions, evaluating models, and bundling the model files and configuration for easy deployment. The entire Raster Vision pipeline looks like so:
(More details, including installation instructions, can be found in the official documentation.)
So how do we harness all this power?
Raster Vision in action
Getting started with a basic semantic segmentation example
Let’s apply Raster Vision to a semantic segmentation problem. We will start with a minimal example and then explore some more advanced features. We will use the ISPRS Potsdam Semantic Segmentation dataset, which contains six classes: car, building, low vegetation, tree, impervious, and clutter. The labels are distributed as RGB GeoTIFF files with a different color for each class, which can be seen below. The full dataset comprises 38 scenes, but here, for simplicity, we will only use two.
Before running a pipeline, we need to configure it by writing a Python file that has a get_config
function that returns a PipelineConfig
object. Below is a bare-bones config that uses a single scene for training and another one for validation. For a fuller example, see the isprs_potsdam.py
example in the repo.
After this, we can use the rastervision run
CLI command to run the pipeline.
rastervision run local “./example.py” -a root_uri “./output/”
The output will be written to the ./output
directory, and will include training logs, debug visualizations, model weights, predictions for each validation scene, evaluation metrics, and a model bundle for future deployment. After Raster Vision is done running, the full directory tree will look like so:
We can now examine this basic model’s predictions for the validation scene (predict/6_12/labels.tif
) to see how well it does. If we load the predictions in a GIS software like QGIS, we will see that they are geographically “located” at the same place as the input.
Predictions as vectors and probability maps
In addition to the RGB output, we can obtain vector output (polygons) as well as a full probability map for each of the classes by passing some additional arguments to label_store
. The smooth_as_uint8
quantizes the floating point probability values to 256 levels and saves them as bytes to save space.
Working with multispectral images
Satellites often have advanced sensors that pick up a wide range of the electromagnetic spectrum, resulting in images with more bands than the usual red, green, and blue. Raster Vision makes it trivial to use as many of these bands as you like. What’s more, if you’re using a model pre-trained on RGB images, Raster Vision can modify the first convolutional layer to accept additional (or fewer) channels while retaining the existing pre-trained weights.
Two ways of reading chips (sliding window and random sampling)
We have already seen one way of sampling chips from a large raster in the example above — using a sliding window with a stride equal to the window size. This gets us something like what is shown in the image below.
But these 100 chips are only a small fraction of all possible chips that can be extracted from this scene. To get more chips, we can allow overlaps in neighboring chips by reducing the stride of the sliding window. The following image shows how we can quadruple the number of chips by halving the stride.
An alternative option is to sample chips randomly from anywhere within the raster — as many as we want. This feature also allows us to sample windows of different sizes — this can potentially help the model develop a level of robustness to scale. The snippet below shows how we can tell Raster Vision to sample 200 square windows with sizes ranging from 200 to 400 pixels.
Handling areas of interest
Fully annotating a several thousand by several thousand pixel scene is costly and time consuming. What if we wanted to learn from a partially labeled scene? Or, perhaps, we have divided up a single scene into a training region and a validation region. How do we restrict sampled chips to one region?
Raster Vision allows us to enforce this constraint by specifying an Area of Interest (AOI) in the form of one or more polygons. These can be provided as GeoJSON files to aoi_uris
as shown below.
Adding data augmentation
Data augmentation is an essential element of neural network training and Albumentations is one of the most popular data augmentation libraries around. Raster Vision allows you to specify arbitrarily complex Albumentations transforms (as long as they are serializable) and use them for data augmentation:
Using custom models and loss functions
By default, Raster Vision provides support for some basic TorchVision models such as ResNets (-18/50/102) for chip classification and DeepLabV3 for semantic segmentation. But, this can be unnecessarily restrictive if you want to make architecture customizations specific to your task or just want to try out the flavor-of-the-month Transformer. Which is why Raster Vision also provides the freedom to import and use whatever model you want as long as it interfaces correctly with the training and inference code. It allows importing arbitrary loss functions as well.
This functionality is made possible by the excellent Torch Hub module. In fact, as part of the work on it, we ended up contributing to the Torch Hub source code! It is now capable of loading model definitions from local directories instead of just GitHub repositories.
The following snippet shows how we can modify our semantic segmentation example to use a Panoptic FPN as our model and Focal Loss as our loss function.
Running on AWS Batch
Efficiently training models using Raster Vision requires the use of a high-end GPU and multiple CPU cores. Since many users do not have this kind of hardware in-house, Raster Vision comes with support for running pipelines in the cloud using AWS Batch.
Specifying batch
in the run
command, causes Raster Vision to submit a DAG (directed acyclic graph) representation of the pipeline to Batch, which will then run the pipeline using EC2 instances. This DAG contains a node for each Docker command to run, and an edge for each command that consumes the output of another command. In addition, this DAG specifies whether each command should run on a CPU or GPU instance, and how commands should be parallelized across instances. For example, the predict command can split the work across several nodes which run in parallel. AWS will then automatically start the instances that are needed, execute the commands, retry in case of failure, and then shutdown instances that are no longer needed. When running on Batch, all output is stored on S3, and input is retrieved from S3 or http. Manually setting up Batch to use with Raster Vision can be a bit complicated, so we provide a CloudFormation template to automate the process.
rastervision run batch "example.py" --splits 2 \
-a root_uri "s3://my-bucket/output"
Going beyond geospatial data
Although Raster Vision was built as a tool to learn from geospatial data, its potential applications extend much farther.
Take the field of digital histopathology. An area that deals with very large raster images, it has, in recent years, received much attention from deep learning and computer vision researchers. In this context, the rasters are mega-pixel or giga-pixel resolution scans of microscope slides known as whole slide images (WSIs) and the annotations are usually polygons which can be specified in GeoJSON files.
Not only can Raster Vision be used to train deep learning models for histopathology, it has been successfully used for it.
Conclusion and future plans
Raster Vision is an open source framework that bridges the divide between the world of GIS and deep learning-based computer vision. It provides a configurable computer vision pipeline that works on chip classification, semantic segmentation, and object detection, and seamlessly handles the idiosyncrasies of working with big geospatial datasets. The project began over four years ago when we competed in the ISPRS Potsdam Semantic Segmentation challenge, and has evolved to accommodate many of our client projects. Beyond Azavea, it has been used by graduate students, GIS consultants, non-profits, and governments around the world. In the past year, we have made the framework more flexible by adding support for multiband imagery, and custom models, loss functions, and data augmentations.
The immediate focus for Raster Vision is to bring it into a form that is most compatible with typical machine learning workflows, so that users can more easily make use of its unique capabilities. As part of this, we want to refactor Raster Vision into separate libraries, so that users are able to make use of individual parts (such as GeoDatasets
). We would also like to have better support for the SpatioTemporal Asset Catalog (STAC) datasets. STAC is an increasingly popular data format for storing geospatial data. As for Raster Vision’s computer vision capabilities, we want to add support for instance segmentation and multi-GPU training. In the long-term, we would also like to establish a formal governance structure for the library.
Contributing to Raster Vision
There are many ways to contribute to Raster Vision. Users can ask (and answer!) questions using Github issues, or by posting in our Gitter channel. Making issues for bug reports and feature requests, and small pull requests for bug fixes and documentation improvements are always welcome. For larger pull requests, we encourage users to discuss the idea in an issue before getting too deep into code writing. We are happy to give advice on how to implement things. We are also interested in developing longer-term relationships with other organizations that use Raster Vision who can help us develop a roadmap and maintain the project; email us if you are interested.
We hope that you’ll give Raster Vision a try! Further material can be found in the following places:
- Documentation
- Quickstart
- Examples
- Github Repo
- Gitter Channel
- Azavea Blog Posts on Deep Learning + Remote Sensing
- Cloud Detection in Satellite Imagery
- Transfer Learning from RGB to Multi-band Imagery
- Using Noisy Labels to Train Deep Learning Models on Satellite Imagery
Acknowledgement: Thanks to Rob Emanuele, James McClain, and all of the other past and present contributors to Raster Vision.