Quantifying Ports with Planet Labs Data

Published in

Planet Stories

6 min readApr 26, 2016

Like kids in a candy shop we’ve been excited to dig into all the data supplied by Planet Labs’ Explorer Program. The promise of daily and even multi-day revisit rates for satellite imagery really opens the door on potential analytics. We believe that as Planet Labs continues to increase their flock size the up side of high revisit rates will really come to fruition. With that in mind we wanted to flex the Timbr.io platform to see what it could do with Planet Labs data.

Chris Helm jumped all over it, and with Pramukta’s image processing skills they did some pretty amazing work. At least in my biased estimation. First, Chris created a Planet Labs metadata source that polls their API for imagery by source for a given bounding box or long/lat.

Code for the Planet Labs source available (here)

(Throughout this post the code for each step will be linked in the caption for the image. Building on awesome libraries like Rasterio, Scikit-learn, and OpenCV)

To test out the new source we wanted a location with lots of activity, so we zeroed in on the San Diego Naval Yard. Our goal was to create a set of reusable methods that could be combined to extract ships from the yard and then provide the change in counts by dock over time.

First we needed to create a pipeline to discover good image candidates for analysis. Using the Planet Labs source we could set up a query for our bounding box of interest that would grab currently available imagery as well as poll that source for new imagery as it becomes available. To make our analysis easier we’d like to set up a pipeline that preprocesses the data to get just what we want. For instance we can grab the thumbnails and do a bit of metadata and imagery analysis to identify cloud free imagery.

Once the pipeline collected a set of metadata we created a snapshot of the data, and published it to our Jupyter notebook. This made our data available for discovery and analysis through the extension we created for Jupyter.

From our pipeline collection of metadata we can then grab the imagery we want to build models with by using a transform in the Timbr.io repository.

In the search results we can see several data transforms that could help us analyze our data. We may want to scroll through our imagery collection or clip it to a specific area of interest — like this collection of docks in San Diego.

The first analytical method we will dive into is separating land from the water in the image so we can extract the docks and ships. We can separate the coastline from the image array by thresholding the image to create a binary land/water mask. Then we can apply a distance transform to both land and water, and extract a buffer where the two transforms meet. The output of this method gives us a clean buffer that we can use to extract docks and ships from multiple images. We can then take the union of the buffer from each of our images to get a clean mask of the persistent features in our area of interest (the docks).

Now that we have an appropriate buffer mask we can start cropping the docks and boats from each image. This will give us a set of discrete units against which we can do our more sophisticated analysis. In order to isolate docks as unique features, separate from land, we can derive a topological skeleton of the buffer mask. By counting adjacent pixels and decimating regions with high pixel counts we can break the skeleton into a series of unconnected segments. Then we can take each segments and isolate the docks users segments’ orientation and solidity.

As you can see in the cropped images, one of the trade offs with a high revisit rate is the lower pixel resolution of the image. At 3–5' per pixel we need to be a bit more creative with our methods to effectively extract ships to create counts. To do so we are going to create a dock/land mask that we can use to remove everything permanent from each image (land and docks). Next we can leverage a series of masking techniques that become unioned into a final “region_mask” that can be implemented to extract ships from docks, land, and water. The result is a clean segmentation of the docks from the ships and water.

Code available (here) — dependencies with previous steps

Now that we have a clean mask of each dock we can get down to the business of counting ships across an array of Planet Labs images. From our snapshot we have imagery ranging from 2013–2016 with a total of five time intervals. For each dock we can now create a count of ships across all the collected time periods. The result is a stacked bar chart showing the churn by dock.

Further, we can take a look at the accuracy of our feature extraction algorithm by looking the specific results for a dock. For region three we can see the method worked pretty well just missing one ship in image four.

Also there is the opportunity to explore the distribution of ship sizes across the Naval Yard. Do some docks serve bigger ships exclusively? To answer this question we calcuated the size of the contour areas for each extract and plotted them by dock.

The x-axis is the dock number and the y-axis is the size of the ships at that dock. While no docks look to serve big ships exclusively we do see that the larger ships do aggloermate around docks zero, four and six. This plots also opens the question of that the over all distribution of ship sizes is across all the data we collected. The result can be seen below.

The x-axis is the size of the ships extracted and the y-axis is the frequency of that size range occuring in the data. Not surprisingly we see that the majority of ships are small and a small minority are large across our time series. If you’d like to check out the whole analysis we have a Jupyter notebook here.

To operationalize this work we can have our pipeline polling for new images in our area of interest then updating this analysis programmatically with the latest appropriate image from Planet Labs. This allows the analysis to be a proactive service instead of a post mortem. We think this really shifts the potential for how analysis can be leveraged for business applications.

For our next post in this series we’ll dig into the reusability of the method and associated transforms. Specifically, we’ll take our now nicely packaged algorithms and apply them to another port, and quantify how much tuning is needed to rinse and repeat the analysis across the globe. To sign up for early beta access to the platform hit us up here.

Quantifying Ports with Planet Labs Data

Written by Timbr.io