Getting Started With SpaceNet Data
The first SpaceNet challenge is complete, but the data remains available for download and analysis on AWS. This dataset contains a massive amount of labeled data in GeoJSON files, a format that may be unfamiliar to many in the computer vision field. This post aims to lower the barrier of entry for exploring SpaceNet data by demonstrating methods to transform and visualize the raw SpaceNet GeoJSON labels into formats more conducive for machine learning, namely NumPy arrays and image masks. Further motivating the study of SpaceNet data is the release of a new SpaceNet point of interest dataset. We include python code for the interested reader, and refer the reader to the SpaceNet Challenge repository for more utilities.
- December 2017 update: updated code is also available here.
1. Data Access
After creating an AWS account, download the data at the SpaceNet AWS portal. Detailed descriptions of data formats and download instructions can be found here. In short, the command to download processed 200m x 200m image tiles with associated building footprints is:
aws s3api get-object --bucket spacenet-dataset \
--key AOI_1_Rio/processedData/processedBuildingLabels.tar.gz \
--request-payer requester processedBuildingLabels.tar.gz
For this post, we will focus on the TopCoder challenge dataset. Upon downloading and expanding the tarballs, the TopCoder training directory structure should appear as follows:
In this post we will focus on the high-resolution 3-band imagery as well as the vector data.
2. Data Inspection
Image cutouts for the pan-sharpened 3-band imagery are 438–439 pixels in width, and 406–407 pixels in height. 8-band images have not been pan-sharpened and so have 1/4 the resolution of the 3-band imagery at 110 x 102 pixels. For each unique image ID we find a corresponding entry in the vectordata/geoJson directory with image footprints.
2. Ground Truth Transform
Computer vision algorithms tend to operate in pixel space, where locations are reported on the matrix of pixel positions rather than latitude and longitude. After the initial data download, or extraction, the second step in the extract-transform-load (ETL) process is to transform the latitude-longitude coordinates in the GeoJSON label files to pixel coordinates. We describe three methods of transforming the GeoJSON label files into pixel coordinates in various formats.
2.1 Building Outline Coordinates
The GeoJSON file lists building polygon vertices in latitude and longitude. Transforming these vertices into pixel coordinates requires knowledge of the image extent and precise geometric coordinate transform. This information (along with much more) can be extracted with the GDAL code suite. A number of sophisticated functions using GDAL and other geospatial libraries are available in the SpaceNet utilities repository on GitHub. The code below takes the GeoJSON label file and corresponding image and returns two coordinate arrays, one in geospatial coordinates (latitude and longitude) and one in pixel coordinates.
We can inspect our transform by overlaying the ground truth polygons on the input image using matplotlib.
2.2 Building Mask
Another option for building labels is a simple building mask where we create an image with background regions set to zero and areas of interest (buildings) set to 1. Image masks are popular for training neural network segmentation algorithms (e.g: DeconvNet). One critical failure of masks that we will demonstrate below, however, is their inability to differentiate adjacent objects.
Similar to plot_truth_coords.py above, the output of create_building_mask.py can be visualized with matplotlib. For brevity we refrain from posting the code in this post, though the interested reader can visit: plot_building_mask.py.
2.3 Signed Distance Transform
A final method for labeling ground truth is to adopt the signed distance transform of Yuan 2016. This transform was applied to SpaceNet data both here, and here. This distance transform encodes each point in the image with the distance in meters from a building boundary. Hence in the output distance map negative regions will be outside buildings, zero values denote building boundaries, and positive regions reside inside building contours. The code below yields the transform.
2.4. Combined Visualization
We can now visualize all three ground truth options simultaneously, as shown below.
A script to recreate all of the transformations and visualizations created in this post is located here, and yields an output directory akin to Figure 7 below.
3. Conclusions
The GIS (geographic information systems) experts that format satellite imagery data speak a slightly different language than most computer vision experts. Hoping to encourage exploration of SpaceNet data, this post explores some useful data transformations for SpaceNet building labels, with attendant code and visualizations.
With any luck the massive amount of labeled SpaceNet data will stir the imagination of an increasing cadre of computer vision experts, and thereby help redefine the nature of satellite imagery analytics.
*Footnote: Many thanks to @david.lindenbaum for providing the SpaceNet Challenge utilities repository, upon which much of the code included here is based.