Miami, Florida, captured by a SkySat satellite on August 13, 2017. Image ©2017 Planet Labs, Inc. cc-by-sa 4.0

CNG Part 3: Planet’s Cloud Native Geospatial Architecture

Chris Holmes
Planet Stories

--

So far we have talked about Cloud Native Geospatial in the abstract, and introduced a core CNG format — the Cloud Optimized GeoTIFF. At this point the most helpful thing is likely to get a bit more concrete about what an actual Cloud Native Geospatial Architecture looks like. Planet has been building one for several years now, so it’s a great place to start.

First a bit of history. Planet was started about 6 years ago in the heart of Silicon Valley, with big ambitions to scale. For the past few years, venture capitalists have looked down upon any start up that is paying for its own hardware. The cloud may be more expensive when startups are small, but if they are hoping to grow big then needing to buy servers can really slow down growth. So Planet was built from the ground up on the cloud, with data going straight from the satellite to the ground station to cloud storage and processed in a truly cloud native architecture.

Some have actually been concerned that Planet might outgrow commercial clouds, generating and storing so much data that it would be cheaper to build its own cloud. Every couple years this question is analyzed deeply, generally by Troy Toman, who built out Rackspace’s public cloud, and it always comes back with a resounding ‘no’. My favorite line of his is ‘the only people who want to build their own cloud are usually people that haven’t done it before’. Though he has the team that could build a world class cloud, they are much more effectively used on the unique Planet problems, like building a storage system for an archive that grows by over 1 million images every day. The economies of scale that Amazon, Google, and Azure are able to achieve are quite hard to compete with, so even if the equation works now it is hard to keep up with their continued innovation. So Planet is quite happy to be a user of cloud services and innovate in its application of the latest technology to earth imagery.

Scalable Data Pipeline and Cloud Optimized GeoTIFF’s at Planet

Planet built its imagery pipeline primarily for internal use — to take raw images from space and flat field, calibrate, georectify, orthorectify, detect clouds and then make the processed data available for web and API consumption. The internal ‘jobs system’ manages all the processing, working directly with files stored on cloud storage. The input data comes from cloud storage, and the outputs are also on cloud storage. Outputs for customers sit behind the Planet Data API, to authorize proper access, but every single bit of data produced by Planet is always Cloud Optimized GeoTIFF in a cloud bucket. So all of the data in Planet’s API’s can be streamed with most any geo software built on GDAL, accessing the data in a fully cloud native manner.

In the early days customers would often ask if they could get special access to the Planet Platform to run their own algorithms. The answer would always be to run in the same availability zone of the cloud as Planet and they’d have the exact same access that internal developers do. They often wouldn’t like that answer, as they’d want to believe there was some special access, some faster way of reading the data that Planet had internally. But the point of a cloud native architecture is that internal devs use the cloud with the same best practices that are recommend to customers. Most customers do get a more finished product (ortho-rectified instead of ‘basic’ imagery) than planet uses for internal processes, but both are served in the exact same way. Yes, there are some quirks of working in the cloud, but Planet believes that operating the same way as its customers will create the best experience for customers.

A handful of those customers have really taken advantage of the cloud native geospatial architecture. Santiago & Cintra as well as a couple Agriculture applications were built to run in the same cloud, and they deliver a more finished information product, derived from Planet’s cloud data, to their end users. They are able to deliver up to date results faster, and also save drasticly on hosting costs. They stream the data direct into their processing pipelines, and then render their own tiles or even leverage Planet’s tile serving capabilities.

Web Tiles

Those tile serving capabilities are another core pillar of Planet’s architecture. Every single scene and mosaic in Planet’s vast catalog can be rendered instantly in an online map, as a ‘slippy map’ — working just like Google Maps, but with up to date imagery from Planet. A large number of use cases of imagery are still primarily users looking at the picture and making conclusions about the world. In traditional workflows, users needed to download full scenes (often hundreds of megabytes or even gigabytes), and then load them up in expensive GIS or Remote Sensing software just to look at what was captured. With web tiles users can zoom in to full resolution in an online map, using Planet Explorer, before deciding to actually download and do further analysis. Or they may be able to do their complete workflow fully online — making a decision by simply looking at the imagery at full resolution.

Open California data in a fully online workflow of Long Beach port activity assessment using Planet Explorer

Full resolution data in Open California is available to all users of Planet Explorer, and any user who purchases a data subscription also gets that data in full resolution online. While web mapping has been around for quite awhile, the cloud native geospatial workflows makes it the first way to visualize imagery, instead of the last step of publishing to a web map server.

Another key aspect of the web tiles is that they are not limited to just the main Planet GUI (Planet Explorer). The same tiles are available as a service, as XYZ tiles, as well as the WMTS standard from the Open Geospatial Consortium. This enables developers to build apps that leverage the exact same web tiles available in Planet’s web interfaces.

Same Long Beach imagery as web tiles, used as base layer for Carto analysis and editing

One cool side effect of using an open standard like WMTS is that the tiles can stream in to desktop systems like Esri and QGIS. For visual inspection of data as well as GIS tracing type applications this is an ideal delivery mechanism. Users don’t need to download a huge mosaic to be able to zoom in anywhere on the globe to high zoom levels — they can just stream from the tiles. And they can also easily grab the latest tiles, as there are newly published mosaics every month. And Planet scenes can also be instantly streamed as web tiles, bypassing the ‘activation’ data preparation step.

API Everything

The other major leg of Planet’s cloud architecture is building everything as an API. The development team has fully embraced microservices, so all the internal functionality to process and serve up imagery is also completely API-first. The team has taken a shining to Go, and the microservices architecture enables us to bring Go in incrementally, providing new services, or being implemented in the refactoring of services. Many of the internal interfaces are built with gRPC, which has been working well.

Working in a microservices manner internally enables Planet to release modular API’s externally more easily. It is the same style of evolution that Amazon Web Services has undergone — first code modular internal components, then start to open those up and productize them more.

The center of the Data API is the catalog of every single image in Planet’s holdings. The philosophy of Planet is to have an open catalog, so anyone can search Planet’s imagery, through the API, through Planet Explorer, or through integrations built on the API. The API can be queried by any combination of parameters — by geography, by time, or by any of the metadata fields. Downloading data is only available in places where a user has access rights, which makes sense for Planet as a data company, but all data is on the cloud and online, able to be downloaded with minimal pain. Making that catalog available as an API that anybody or any program can access is a key tenant of Cloud Native Geospatial architectures — the core location data lives is online. Everything else is a copy of the canonical data. But with Cloud Optimized GeoTIFF everyone should stream that data when using it, instead of even bothering to make a copy.

Products on Demand

Building API-first and on the cloud has enabled Planet to approach a number of traditional geospatial processing workflows differently. Acquiring RapidEye gave a deep insight into many of those traditional workflows, as their production pipeline worked in a more manual way. Perhaps the biggest core difference relative to traditional imagery pipelines is generation of imagery products on demand, instead of on acquisition. Planet’s data processing pipeline runs a number of operations when imagery is ingested, but does not turn those into pixel outputs.

Rectification is a good example — a traditional process ends with a ‘3B product’ — an image that is stored on disk. The output of Planet’s rectification is simply the rational polynomial coefficient (RPC) — the compact representation of the ground to image geometry. This is stored as metadata, as are the core pixels that were captured from space. It is only when an end-user requests a Visual or Analytic image that the pixels get transformed into a traditional imagery product. The process is done on demand, requested through the API. This request may come directly from a developer, it could be a user working with Planet Explorer that kicks it off, or it may be third party software that has integrated with the API. Planet’s jobs system manages tens of thousands of virtual machines at once, so it just allocates a few more for the duration of the generation of the images. They are then cached, so that when another user requests the same image they don’t have to wait again for the generation.

Doing everything on demand and through an API then opens up a number of possibilities to tailor the data more to how the end user wants it. The first of these was to enable ‘clipping’, which lets a user request just the geometry they care about, instead of trying to select the scenes that overlap with their area of interest. This will evolve to enable co-registration of images, application of TOA, atmospheric correction, surface reflectance and eventually full analytic processing of images with operations like band math to create indices and even computer vision-based object detection.

Deep Learning Model detecting buildings in Planet Imagery

The power of doing more analytics on the fly in the pipeline becomes clear when one looks at how imagery used to be delivered. Previously, a couple of customers in South America required co-registered RapidEye and Landsat images, which involved to a huge pre-processing job, as all the existing image products had to be re-processed with the proper co-registration. In the on-the-fly, API-driven world, the production of the co-registered stack of images will happen as the user requested them. Images that are never requested do not need to be processed, but all those that are needed are instantiated upon the user’s request.

The Cloud Native Geospatial Ecosystem

Planet’s longer term vision is to expand that processing pipeline, so analytic information and insight is produced on demand, streaming to end-users directly — and often not even showing them the imagery it was derived from. But behind those information APIs there will always be a rich ecosystem of geospatial processes that all occur natively in the cloud. Indeed, many information streams may only be consumed by other software, adding value and supplemental information, to get at the end information product. In this world, the cloud native geospatial architecture becomes a requirement, not a nice-to-have. Passing that massive amounts of data downlinked in and out of the cloud would not be able to keep up with the new data flowing in.

The key to this vision of streaming information feeds that abstract out remote sensing and GIS is a true ecosystem of cloud native geospatial collaborators. To help realize this, Planet is building its platform as a set of loosely coupled components that can integrate with others. Instead of a monolithic platform that one is either in or not, Planet’s architecture can be adopted and adapted by others who see the potential of cloud native geospatial to make the power of insights derived from geo data available to everyone.

--

--

Chris Holmes
Planet Stories

Product Architect @ Planet, Board Member @ Open Geospatial Consortium, Technical Fellow @ Radiant.Earth