By Jake Shermeyer and Daniel Hogan
Preface: SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e., building footprint & road network detection). SpaceNet is run in collaboration by co-founder and managing partner, CosmiQ Works, co-founder and co-chair, Maxar Technologies, and our partners including Amazon Web Services (AWS), Capella Space, Topcoder, IEEE GRSS, the National Geospatial-Intelligence Agency and Planet.
Today the SpaceNet partners are pleased to announce the release of the EXPANDED version of the SpaceNet 6 dataset over the port of Rotterdam, the Netherlands. The expanded version is unique and features a combination of Capella Space Synthetic Aperture Radar (SAR) and Maxar WorldView 2 imagery.
The SAR data in particular is one-of-kind and the first openly licensed dataset to feature quad-polarized imagery at 0.25m spatial resolution over such a vast extent. We distribute 202 SAR image strips in two formats: one with minimal pre-processing (Single Look Complex) as well as a second set of new six-band georeferenced products that include 4 channels of intensity and 2 channels derived from a Pauli decomposition. The decomposition channels show different types of scattering behavior. Complimentary to our SAR data, we also release our untiled Maxar WorldView 2 image spanning ~92 km² at 0.5m spatial resolution. We believe this dataset will continue to further research and advance remote sensing analytics beyond the optical spectrum and into the SAR modality.
You can explore and download the data now, all you need is an AWS account and the AWS CLI installed and configured. Once you’ve done that, you can query the full dataset using the command below and download your components of interest.
aws s3 ls s3://spacenet-dataset/AOIs/AOI_11_Rotterdam/
The SAR Dataset
All of the SAR data comes from Capella Space’s X-band quad-pol sensor mounted on an aircraft. We ultimately release 202 unique image strips with two different processing levels: georeferenced magnitude and polarimetry (MAG-POL) data and Single Look Complex (SLC) data. Many of these strips overlap to create a dense stack of SAR data with multiple revisits spanning a three day time period in August 2019. The extent of these image strips covers a large portion of Rotterdam and 120 km² of total area, with each strip spanning approximately 0.7 km by 10 km.
The georeferenced MAG-POL data has a base resolution of 0.25m and contains 4 channels of SAR magnitude (intensity) (1: HH , 2: HV, 3: VH, 4: VV) as well as 2 channels (5: alpha² & 6: beta²) created from a Pauli polarimetric decomposition process. Decomposition methods like the one we applied in the figure below can show different types of backscatter and can be useful for discerning between different types of objects or land cover on the ground. We chose to work with the Pauli decomposition because it provides simple but valuable descriptive statistics about the surface. The alpha² channel shows single or odd-bounce scattering while the beta² channel shows double or even-bounce scattering. The third term of a Pauli decomposition, gamma², shows volume scattering and is proportional to the HV band. All MAG-POL data was directly created from the SLC data using a pipeline from our preprocessing SAR library contained in Solaris. The full pipeline can be found here and modified for any end-users needs.
The second set of data minimal pre-processing and is non-georeferenced called the Single Look Complex (SLC) data and metadata. SLC data retains the phase and complex information inherent to SAR sensors. Consequently, end users may process these data however they wish and have maximum control over their usage and application. The data has a spatial resolution of 0.5m x 0.25m (Range x Azimuth).
The Optical Dataset
In addition to the SAR data, we release the full untiled Maxar WorldView 2 image. We distribute 4 image products including the panchromatic band, pan-sharpened RGB and RGBNIR data (0.5m) and RGBNIR data (2.0m). As in the challenge, we hold back the optical data over the final testing area but distribute these optical data for validation and training. Consequently, the optical data can be used for pre-processing the SAR data in some fashion, such as colorization, domain adaptation, or image translation, but cannot be used to directly map buildings. We structure the dataset in such a way to mimic real-world scenarios where historical optical data may be available, but concurrent optical collection with SAR is often not possible due to inconsistent orbits of the sensors, or cloud cover that will render the optical data unusable.
We distribute the full untiled building footprint annotations which include both the training dataset as well as validation data used for the SpaceNet 6 TopCoder leaderboard. Ultimately, end-users are encouraged to split the data as they see fit. We retain a final testing hold-back portion that we intend to use for a persistent scoring server. The building footprint labels are derived from the 3D Basisregistratie Adressen en Gebouwen (3DBAG) dataset. Each building footprint also contains a 3D component and height estimate derived from an aerial Light Detection and Ranging (LIDAR) collect. Further information about the annotation preprocessing can be found in the original data release blog or CVPR EarthVision academic paper.
Remember you can download and explore the full dataset for free on S3:
aws s3 ls s3://spacenet-dataset/AOIs/AOI_11_Rotterdam/
Today we release the final piece of the SpaceNet 6 dataset. We hope that these data will continue to incentivize new machine learning approaches specific to high-resolution SAR data as well as spur new data fusion or domain adaptation methods. Remember to check out our other datasets and the SpaceNet 7 challenge hosted on TopCoder.