The SpaceNet 5 Baseline — Part 3: Extracting Road Speed Vectors from Satellite Imagery
Preface: SpaceNet LLC is a nonprofit organization dedicated to accelerating open source, artificial intelligence applied research for geospatial applications, specifically foundational mapping (i.e. building footprint & road network detection). SpaceNet is run in collaboration with CosmiQ Works, Maxar Technologies, Intel AI, Amazon Web Services (AWS), Capella Space, and Topcoder.
We’re closing in on the final stretch for the SpaceNet 5 Challenge that aims to extract road networks and route travel times directly from satellite imagery. Yet there’s still plenty of time to get involved, as our previous blog showed reasonable road mask predictions after only 10 hours of training. This post builds upon Part 1 (data preparation) and Part 2 (segmentation model training), and is the final installment in our series that seeks to lower the barrier of entry for extracting roads and speeds from remote sensing imagery. In the sections below, we’ll discuss methods to extract a road network from our inferred segmentation mask, and infer speed limits and travel times for each road segment.
1. Segmentation Model Predictions
This section briefly summarizes Part 2 of this series, where we trained a segmentation model to detect road centerline features using a multi-class training mask where each mask layer corresponded to a unique speed bin. Outputs are akin to Figure 1, which shows a multi-channel prediction mask where differing colors denote various predicted travel speeds.
2. Setting up the JSON
CRESI uses a JSON file to store parameters for training and inference; we will use the sn5_baseline.json file for this example. Though many of the parameters in this file need not be altered, there are a few values that may need tweaked for the graph and speed extraction procedures.
For the model trained in Part 2, we used 7 speed bins (channels 0–6) and appended the total road mask as the final channel (channel 7), therefore we use this final channel to extract a mask skeleton:
Prior to skeletonization, we refine our prediction mask and select a mask threshold for filtering purposes, in this case 0.3:
We also filter out unconnected graphs (with less than 20 pixels of extent) and errant spurs less than 10 meters in length:
The remaining values in the JSON file should already have been set in Part 2, and can be left alone.
The first steps in turning the raw output mask into a road network are accomplished with the following script:
python /path/to/cresi/cresi/04_skeletonize.py jsons/sn5_baseline.json
This command first refines the mask with operations such as opening, closing, smoothing, and thresholding (see Figure 2).
This refined mask is subsequently turned into a skeleton. See Figure 3 for a schematic representation, while Figure 4 illustrates the skeleton for our Moscow test chip.
4. Graph Creation
After creating a skeleton, the 04_skeletonize.py script builds a graph from this skeleton. This is achieved via the sknw package, and yields a NetworkX graph structure (see Figure 5).
5. Speed Extraction
We estimate travel time for a given road edge by leveraging the speed information encapsulated in the prediction mask. Along multiple midpoints in each edge we extract a small 8 × 8 pixel patch from the prediction mask. The speed of the patch is estimated by filtering out low probability values (likely background), and averaging the remaining pixels (see Figure 6). If the majority of the the high confidence pixels in the prediction mask patch belong to channel 3 (corresponding to 31–40 mph), we would assign the speed at that patch to be 35 mph. We estimate the speed limit of the entire edge by taking the mean of the speeds at each segment midpoint. Travel time is then simply length divided by mean speed. See our arXiv paper for more details.
The 04_skeletonize.py outputs a WKT file, which is a holdover from the SpaceNet 3 competition. To execute our speed inference process, we will ingest that WKT file back into a graph, while also cleaning out short spurious edges that don’t connect with any other edge and removing disconnected subgraphs.
python /path/to/cresi/cresi/05_wkt_to_G.py jsons/sn5_baseline.json
python /path/to/cresi/cresi/06_infer_speed.py jsons/sn5_baseline.json
The resulting graph is saved as both a pickle object and a shapefile; this geo-registered shapefile object can be overlaid on the original image, with roads colored by speed limit (see Figure 7).
We see in Figure 7 that the model does a relatively good job of predicting road geometries. The APLS_time score for this Moscow test chip is 0.54, while the APLS_length is slightly higher at 0.59, which means that the model is also doing a good job of correctly inferring road speeds.
The SpaceNet 5 Moscow observation is taken at 22 degrees off-nadir (nadir is looking straight down), which results in “building tilt” where tall buildings obscure roadways (in this case roads to the East). Historically, such obfuscations have been quite problematic for road extraction methods, yet our model does a surprisingly good job of predicting obscured roadways. This is illustrated in Figure 8, where we draw boxes around the regions where the model is (correctly) predicting roads behind buildings.
6. SpaceNet Submission
We can now create a submission for the SpaceNet 5 challenge by simply executing the following command:
python /path/to/cresi/cresi/07a_create_submission_wkt.py jsons/sn5_baseline.json
This command creates a solution.csv file in the correct format for challenge submission: for each road in the image we list the pixel geometry, length in meters, and computed travel time in seconds.
This solution yields a solid score of APLS_time = 0.50 (APLS_length = 0.55) over the entirety of SpaceNet 5 test corpus, despite the fact that we trained for only 10 hours on a TitanX GPU.
In this post we demonstrated how to extract road networks with speed limits from deep learning segmentation models. Even with a truncated training time, we note a respectable APLS_time score of 0.50 over the SpaceNet5 test cities (Moscow, Russia; Mumbai, India; San Juan, Puerto Rico). Furthermore, we note that our model is able to extrapolate roadways surprisingly well, even in areas obscured by building tilt.
Our segmentation model was trained for a mere 10 hours on modest hardware (Titan X GPU). The V100 GPUs available on AWS are over twice as fast as our GPU, therefore training a model from scratch to the same performance as this post would cost only $15 on AWS. Recall that AWS GPU credits are available to SpaceNet Challenge participants, to the tune of 10 hours on a V100, so exceeding the score posted here should be very achievable.
Extracting road speeds and travel times from satellite imagery is a difficult task, yet one that applies to a great many problems in the humanitarian and disaster response domains. This series demonstrated that such a task is indeed possible. Stay tuned for further updates in the coming months as we dive into the outputs and lessons learned from the SpaceNet 5 Challenge.