SpaceNet Labels To Pascal VOC SBD Benchmark Release Labels
This blogpost serves as an introduction to the Pascal VOC SBD benchmark release MATLAB labels (linked here), as well as one approach to converting the SpaceNet geoJson vector labels into the Pascal VOC SBD benchmark release MATLAB format. My hope is that this blogpost and GitHub repository (linked at the bottom of the post) facilitates the application of algorithms developed for the Pascal VOC SBD image competition to the 1st SpaceNet competition linked here.
Warning: This conversion is for the first SpaceNet competition. For the second SpaceNet competition, see here.
Downloading the Data
Exploring the Pascal VOC SBD Labels
Let us first recall some general information regarding the Pascal VOC SBD benchmark release.
The SBD benchmark release is bundled together in a directory called ‘dataset’. There are three subdirectories:
This directory contains all of the images in jpg format.
Aside: Since the SpaceNet images are in tiff format, I used the ImageMagick mogrify command available here to batch convert all of the tiff files to jpg files.
This directory contains category-specific segmentation and boundary information for each image.
Aside: The Pascal VOC competition labels objects from 20 different categories: ‘aeroplane’, ‘bicycle’, ‘bird’, ‘boat’, ‘bottle’, ‘bus’, ‘car’, ‘cat’, ‘chair’, ‘cow’, ‘diningtable’, ‘dog’, ‘horse’, ‘motorbike’, ‘person’, ‘pottedplant’, ‘sheep’, ‘sofa’, ‘train’, and ‘tvmonitor’.
There is one .mat file for each image containing class information, and each .mat file contains a MATLAB struct called GTcls with 3 fields:
- GTcls.Segmentation is a single two-dimensional matrix containing the class segmentation information. Matrix entries that correspond to pixels that belong to category k have value k, and matrix entries corresponding to pixels that do not belong to any category have the value 0.
- GTcls.Boundaries is a MATLAB cell array. GTcls.Boundaries[k] is a MATLAB sparse matrix that contains the boundaries of the k-th category.
- GTcls.CategoriesPresent is a row vector of all the categories that are present in the jpg image.
This directory contains instance-specific segmentations and boundaries. There is one .mat file for each image containing instance information, and each .mat file contains a MATLAB struct called GTinst with 3 fields:
- GTinst.Segmentation is a single two-dimensional matrix containing segmentation information. Matrix entries that correspond to pixels belonging to the i-th instance have value i.
- GTinst.Boundaries is a cell array. GTinst.Boundaries[i] is a MATLAB sparse matrix that contains the boundary of the i-th instance.
- GTinst.Categories is a column vector with as many entries as there are instances. Each entry is the category label of the i-th instance.
There are in addition two text files, train.txt and val.txt, containing the names (without the extension) of the training images and validation images respectively.
Aside: I do not own or have access to a MATLAB license. I did my data exploration using Octave, which has the ability to load .mat files. All of my data manipulation was done in python using scipy.io.loadmat and scipy.io.savemat to load and save .mat files into python numpy arrays. Thus, one does not need MATLAB to engage with these labels.
Here is an example of loading an instance .mat file and retrieving the three fields of the GTinst struct in python.
Remark: Observe that the entries of matrices (sparse or not) are of type unsigned 8-bit integers.
SpaceNet geoJson Labels
Let us now take a look at an example SpaceNet geoJson file, as this (together with its corresponding image) will be what we wish to transform into the MATLAB data structures presented above.
The building label is stored as a Polygon with five pairs of latitudes and longitudes giving the vertices of the label. This is fundamentally different than the labels used in the Pascal VOC competition. In particular, the SpaceNet geoJson labels record the vertices of a building label in terms of latitude and longitude, while the Pascal VOC labels record object labels in a variety of pixel-level matrices.
Given a latitude and longitude building vertex, one can try to convert this vertex into its corresponding matrix location in a pixel-level representation of a satellite image. But what if the longitude and latitude map to somewhere in between two pixels in the satellite image? Which pixel should represent the vertex of the building? Such problems always arise when converting continuous data (like latitudes and longitudes) to discrete data (like finite matrices representing pixel locations).
Thus, there is no canonical way to transform a SpaceNet geoJson label to its corresponding Pascal VOC .mat labels. That is, this transformation of labels will depend on choices.
To solve this problem (that is, make one choice), we will use a distance transform that takes a raster source (tif) and vector source (geoJson file) and returns a matrix whose values are positive if they are inside a building segmentation label and negative if they are outside a building segmentation label. The matrix value is zero if it is on the boundary of a building segmentation label.
The distance transform has appeared in other CosmiQ projects including the SpaceNet utilities repository available here, Adam Van Etten’s blog on getting started with SpaceNet available here, and Patrick Hagerty’s blog on object detection in SpaceNet available here.
The code below shows the distance transform as well as how to build class and instance segmentation functions given the distance transform.
We also include a GitHub repository containing the file ‘spacenet_labels_dir_to_voc_labels_dir.py’
One can run the command:
in a directory containing a folder of raster tif files, a folder of vector geoJson files, an empty class labels directory, and another empty instance labels directory. This will process all of the labels at once.
This GitHub repository is available at: https://github.com/lncohn/spacent_to_pascal
In summary, this allows for any algorithm developed for the Pascal VOC SBD benchmark release competition to be applied to the SpaceNet data competition.
Acknowledgements: The author thanks Patrick Hagerty, David Lindenbaum, and Adam Van Etten for helpful discussions regarding SpaceNet and the distance transform.