Getting Started With SpaceNet Data

The first SpaceNet challenge is complete, but the data remains available for download and analysis on AWS. This dataset contains a massive amount of labeled data in GeoJSON files, a format that may be unfamiliar to many in the computer vision field. This post aims to lower the barrier of entry for exploring SpaceNet data by demonstrating methods to transform and visualize the raw SpaceNet GeoJSON labels into formats more conducive for machine learning, namely NumPy arrays and image masks. Further motivating the study of SpaceNet data is the release of a new SpaceNet point of interest dataset. We include python code for the interested reader, and refer the reader to the SpaceNet Challenge repository for more utilities.

  • December 2017 update: updated code is also available here.

1. Data Access

After creating an AWS account, download the data at the SpaceNet AWS portal. Detailed descriptions of data formats and download instructions can be found here. In short, the command to download processed 200m x 200m image tiles with associated building footprints is:

aws s3api get-object --bucket spacenet-dataset \
--key AOI_1_Rio/processedData/processedBuildingLabels.tar.gz \
--request-payer requester processedBuildingLabels.tar.gz

For this post, we will focus on the TopCoder challenge dataset. Upon downloading and expanding the tarballs, the TopCoder training directory structure should appear as follows:

Figure 1. SpaceNet TopCoder data directory

In this post we will focus on the high-resolution 3-band imagery as well as the vector data.

2. Data Inspection

Image cutouts for the pan-sharpened 3-band imagery are 438–439 pixels in width, and 406–407 pixels in height. 8-band images have not been pan-sharpened and so have 1/4 the resolution of the 3-band imagery at 110 x 102 pixels. For each unique image ID we find a corresponding entry in the vectordata/geoJson directory with image footprints.

Figure 2. Random image from the SpaceNet training dataset (3band_013022223130_Public_img124.tif).
Figure 3. First entry of the GeoJSON label file associated with Figure 2. Here we show the first building label associated with the image; note that coordinates are stored as a WKT polygon or multipolygon with coordinates stored as [longitude, latitude, elevation]. The elevation field is always zero for this dataset.

2. Ground Truth Transform

Computer vision algorithms tend to operate in pixel space, where locations are reported on the matrix of pixel positions rather than latitude and longitude. After the initial data download, or extraction, the second step in the extract-transform-load (ETL) process is to transform the latitude-longitude coordinates in the GeoJSON label files to pixel coordinates. We describe three methods of transforming the GeoJSON label files into pixel coordinates in various formats.

2.1 Building Outline Coordinates

The GeoJSON file lists building polygon vertices in latitude and longitude. Transforming these vertices into pixel coordinates requires knowledge of the image extent and precise geometric coordinate transform. This information (along with much more) can be extracted with the GDAL code suite. A number of sophisticated functions using GDAL and other geospatial libraries are available in the SpaceNet utilities repository on GitHub. The code below takes the GeoJSON label file and corresponding image and returns two coordinate arrays, one in geospatial coordinates (latitude and longitude) and one in pixel coordinates.

Code snippet 1. Function to transform GeoJSON label files to an array of coordinates (both lat,lon and pixel).

We can inspect our transform by overlaying the ground truth polygons on the input image using matplotlib.

Code snippet 2. Function to plot the truth coordinates for an input image.
Figure 3. Output of for a sample SpaceNet image. The left pane shows the raw 3-band image with building footprints overlaid in orange with red boundaries. The raw boundaries are shown in the right panel in red.

2.2 Building Mask

Another option for building labels is a simple building mask where we create an image with background regions set to zero and areas of interest (buildings) set to 1. Image masks are popular for training neural network segmentation algorithms (e.g: DeconvNet). One critical failure of masks that we will demonstrate below, however, is their inability to differentiate adjacent objects.

Code snippet 3. Function to create an image mask using the GeoJSON labels.

Similar to above, the output of can be visualized with matplotlib. For brevity we refrain from posting the code in this post, though the interested reader can visit:

Figure 4. Output of for two sample SpaceNet images (top and bottom rows). The left column displays the raw image with building polygons overlaid in orange. The middle column shows building outlines. The right column demonstrates the building mask created with Note that this approach cannot differentiate the large cluster of adjacent buildings in the center left of the bottom image or the long lines of row houses in the top image. Hence if one used the mask for algorithm training data one would erroneously conclude that one large building exists at these locales rather than multiple smaller adjacent buildings.

2.3 Signed Distance Transform

A final method for labeling ground truth is to adopt the signed distance transform of Yuan 2016. This transform was applied to SpaceNet data both here, and here. This distance transform encodes each point in the image with the distance in meters from a building boundary. Hence in the output distance map negative regions will be outside buildings, zero values denote building boundaries, and positive regions reside inside building contours. The code below yields the transform.

Code snippet 4. Create the signed distance transform.
Figure 5. Output of displaying the results of for two sample SpaceNet images. The left column displays the raw image with building polygons overlaid in orange. The middle column shows the signed distance transform, with a maximum absolute value of 64 meters. The right column overlays ground truth polygons on the distance transform.

2.4. Combined Visualization

We can now visualize all three ground truth options simultaneously, as shown below.

Figure 6. Ground truth displayed with all three transforms, from here.

A script to recreate all of the transformations and visualizations created in this post is located here, and yields an output directory akin to Figure 7 below.

Figure 7. Output of, showing one of the building masks.

3. Conclusions

The GIS (geographic information systems) experts that format satellite imagery data speak a slightly different language than most computer vision experts. Hoping to encourage exploration of SpaceNet data, this post explores some useful data transformations for SpaceNet building labels, with attendant code and visualizations.

With any luck the massive amount of labeled SpaceNet data will stir the imagination of an increasing cadre of computer vision experts, and thereby help redefine the nature of satellite imagery analytics.

*Footnote: Many thanks to @david.lindenbaum for providing the SpaceNet Challenge utilities repository, upon which much of the code included here is based.

Like what you read? Give Adam Van Etten a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.