Unearthing Value in Planet Data: Bridging the Gap Between Geospatial Data and Machine Learning

Jan 15, 2020 · 5 min read
Above is a map of roads and buildings in the Eastern Hemisphere. This is one of the most complete and up to date maps of these features ever created. It reveals details not available in popular mapping tools, in both industrialized cities and rural settlements. Built from a diversely sampled training set, the model produces results across a wide variety of terrains, densities, and land cover types. ©2019 Planet Labs, Inc. All rights reserved.

At Planet our data is geospatial: georeferenced images or labels. There are many tools to manage geospatial data in a queryable and scalable way, including Cloud-optimized GeoTIFFs and GeoJSON, PostGIS and STAC Catalogs. At the same time, we want to leverage the best that the machine learning world has to offer, whether it’s Tensorflow, Keras, Pytorch, or the latest paper from the International Conference on Machine Learning (ICML). However, very few of these frameworks natively understand geospatial data, let alone at scale.

For us, it is not enough to find objects in images. The real value comes from finding objects and determining how land is used in the world at a known point in time.

In order to realize this value, we solved the technical challenge of bridging the gap between geospatial data and machine learning frameworks. By bridging this gap, we’re able to deliver a product with various applications, create the most up to date and complete map of all roads — and more.

Today, we’re giving you a peek into the design decisions we’ve made and lessons we’ve learned as we built these workflows.

Different coordinate spaces can be used to represent the same label, but in a different context. Pixel coordinates are only relevant within an image, whereas the geographic coordinates can be plotted on a map along with the image to give geospatial context to the label. Pixel coordinates are usually just arrays of numbers that can be loaded as numpy arrays, whereas geo coordinates are usually GeoJSONs and can be read by typical GIS applications like ArcGIS or QGIS. ©2019 Planet Labs, Inc. All rights reserved.

Our Workflow

We leverage standard best practices as much as possible, utilizing typical geospatial tools to handle our input imagery and output metrics, and typical machine learning frameworks to train and execute models. A robust set of geo-to-pixel and pixel-to-geo transformations efficiently bridge the two halves of our workflow without sacrificing any functionality on either side. ©2019 Planet Labs, Inc. All rights reserved.

We can group all the steps in our machine learning pipeline into five high-level stages:

Lessons We’ve Learned About Using Geospatial Training Data

1. Always use geo-first datastores.

We never save derived pixel representations, e.g. image chips or pixel-space bounding boxes. By thinking about datasets geospatially we can decouple imagery from the labels.

At Planet we have a wide variety of imagery products. We can label on the 8-bit visual product then train models on a 16-bit calibrated analytic surface reflectance product. We can experiment with custom image types to determine which spectral bands provide the most salient information for a particular application, a question which becomes increasingly relevant as we add more spectral bands to our Doves.

Standard RapidEye RGB image near Santa Cruz vs. false color with Near-Infrared (NIR). Different spectral bands encode additional information that is critical for some applications (ex. NIR is vital for agriculture applications). ©2019 Planet Labs, Inc. All rights reserved.

2. Consider each stage in isolation.

Each stage of our pipeline has a different target audience with different needs. In order to efficiently utilize human labelers with specific domain knowledge, we needed to create a labeling experience that was comprehensible to a user with no significant machine learning expertise. At the same time, our engineers need visibility and control over the internals of model training.

Technical problems manifest differently for each stage. For example, we’ve found that loading imagery onto a map for people to view requires a completely different approach than loading imagery into models for training. For efficient maps, we turn to the geospatial world for WFS3-compliant APIs and webtiles. For efficient loading of pixels into models, we combine Cloud-optimized GeoTIFFs with the GPU optimized data loading pipeline APIs of deep learning frameworks and the horizontal scaling of Kubernetes.

3. Evaluate geospatial models spatially.

Building segmentation model outputs become useful when correlated with other geospatial data, in this case a flood risk map for the city of Bimbo in the Central African Republic. ©2019 Planet Labs, Inc. All rights reserved.

Model metrics are an integral part of any ML pipeline. The usual metrics for object detection are:

These standard metrics all rely on IOU (intersection over union or Jaccard index). IOU is a unitless ratio so it is equivalent in pixel or geo-space, meaning that we can quantify models’ accuracy geospatially. The ability to rate model performance against real-world considerations allows us to compute performance metrics relative to spatiotemporal attributes, such as object size in meters, geographic region, season or climate. This provides critical information for all of our modeling efforts, from object detection (ex. Vessel Detection) to segmentation (ex. land use classification).

Conclusion: The Value Is Geospatial

We’d like to come back to something we mentioned at the beginning:

It is not enough to find objects in images. The real value comes from finding objects in the world at a known point in time.

Using this principle, we built out a machine learning pipeline that runs on a massive catalog of satellite imagery that leverages the best of both the geospatial and machine learning worlds efficiently and in a scalable way. Our efforts to accelerate disaster response times, prepare cities across the globe for climate change, detect illegal deforestation of the Amazon, and create the most up to date and complete map of all roads on the planet are just the beginning.

To learn more about the tools we’ve built, attend this month’s Maptime SF meetup at our office in San Francisco on Tuesday January 21, 2020. Register here.

Review our Analytics page for more information, and if you’re interested in joining the Planet Analytics Team, be sure to check out our Careers page for our latest job postings.

This piece was authored by Planet’s Analytics Infrastructure team, including engineers Ash Hoover, Maya Midzik and Agata Kargol, with engineering manager Benjamin Goldenberg.

Planet Stories

Using space imagery to tell stories about our changing…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store