A previous blog post introduced you to our new open-source framework Solaris, which aims to bridge the gap between difficult geospatial problems and new computer vision techniques. In this post we showcase how Solaris can be used for car segmentation (and detection/localization by proxy) using one of the models featured in SpaceNet 4. Although SpaceNet is focused on foundational infrastructure mapping challenges, the lessons learned and code developed can now easily be transferred to take on new problems (like vehicle detection) all thanks to Solaris.
If you wish to follow along with this blog or run this for yourself, a full jupyter notebook for analysis is provided here. We work under the assumption that you have installed Solaris and any dependencies as listed in the imports section of the notebook.
As in a few previous blogs [1, 2, 3] we work with the Cars with Overhead Context (COWC) dataset that spans 6 cities and has a spatial resolution of 15cm. The difference between this blog and previous blogs is that we use Solaris and a segmentation network instead of object detection to localize vehicles. After downloading the dataset, we must do some light preprocessing: This includes some data reorganization; tiling the images and masks; converting images and masks to GeoTiffs; and creating some csvs to document our file structure.
COWC masks identify cars with a single pixel with a value of 255, however this is a really small target for a neural network to identify. We want to enlarge the area identified as a vehicle a bit to increase visibility and provide a larger object that is easier for a neural network to detect. We use a simple square dilation operation with a size of 9 to accomplish this.
We then split our data into two sets: 5 cities for training (randomly holding out 20% for validation), and 1 city for testing (Salt Lake City, Utah). This will help to showcase how transferrable our model is to a new location. We also go through the process of z-scoring our imagery. Z-scoring standardizes and normalizes our imagery across all of our locations, potentially helping to improve performance and model transferability.
# Train a neural net with 3 lines of code.
config = sol.utils.config.parse('/path/to/yml/vehicleDetection.yml')
trainer = sol.nets.train.Trainer(config)
We work with SpaceNet 4 participant XD_XD’s U-Net model with VGG16 encoder paired with two loss functions: jaccard (good for identifying small objects and imbalanced classes) and binary cross entropy with logits. Training time is about 7 hours on 2 NVIDIA Titan XP GPUs. Ultimately, inferencing is just as simple as training, with a few tweaks to the yml file. Inference speed for images 512 x 512 pixels in size is roughly 1,000 images per minute. Following inferencing, we binarize our model outputs into a car/not car map.
inf_df = sol.nets.infer.get_infer_df(config)
inferer = sol.nets.infer.Inferer(config)
We can also convert our binarized mask into some more palatable geospatial vectors using Solaris’ mask_to poly_geojson function. Finally, using our vectorized predictions and ground truth, we can score our results. Using the F1 metric (also known as the SpaceNet metric) with an IoU of 0.25 (acceptable for small objects) we score a solid 0.92.
We can then pull the centroid from each polygon using geopandas centroid function, and save the predictions as points. With a final touch of GIS symbology magic to convert the points into bounding boxes, we can show off our results.