At Astraea, we believe that the most important dataset on Earth is the Earth itself. The ability to observe and analyze images of our planet is critical to tackling issues like climate change and food insecurity. We also recognize that these challenges are bigger than any one company, which is why we committed ourselves to an “open core” framework built on RasterFrames: Astraea’s geospatial machine learning toolkit supporting global-scale analysis.
RasterFrames® brings together Earth-observation (EO) data access, cloud computing, and DataFrame-based data science…
Over the coming months, in anticipation of its 1.0 release, we will take a deep dive into RasterFrames. We will explore the technology itself, and demonstrate its contribution to Earth Observation (EO). Additionally, we will outline use cases and hopefully spark your imagination along the way. In the present age of EO, groundbreaking technology like RasterFrames can help you query the Planet, analyze her data, and scale your solutions.
What is RasterFrames?
RasterFrames is a tool for processing big geospatial raster data, also known as the pixels that compose EO imagery. At its core, RasterFrames applies the concept of DataFrame structures to EO data. This allows users to make spatiotemporal queries, explore trends, build machine learning models, and more. Giving users the ability to leverage cloud computing to process massive datasets has been part of the RasterFrames vision from day one.
In the beginning, there were frustrated engineers
RasterFrames was born from a drive to help data scientists become more efficient. RasterFrames’ chief architect and VP of R&D at Astraea, Simeon Fitch, built his career simplifying the lives of those who seek insights from data.
“I started out alleviating the pain of aerospace engineers that were forced to cram complex models into inadequate tooling. As big data emerged in the late 2000s, I gravitated toward the burgeoning use of big data in data science. Huge sets of information were being generated, and the compute power needed to process them was increasingly available. In this field, I began to spot thorns of friction that prevented talented practitioners from unlocking the insights they sought. Data structures were complex and laden with jargon. Deploying models at scale required specific high-performance computing expertise. Compute models diverged from conceptual models. With RasterFrames, we aim to solve those specific problems for EO.”
“The explosion of Earth Observation data in recent years has far outpaced the development of the tools and infrastructure required to access the insights within the data. RasterFrames provides those tools and infrastructure.”
~Simeon Fitch, RasterFrames’ chief architect and VP of R&D at Astraea
Query the Earth
Earth Observation data has been Big Data since before the term existed. This legacy means that even today much of the tooling and techniques for finding and acquiring EO data are focused around the task of downloading big files. But downloading big files is not querying data, it’s just accessing the data. Querying data means describing which data you want to access in a way that integrates with describing what you want to do with it, all without having to describe how to access the data.
As with so many other types of data, EO data storage is transitioning to the cloud. RasterFrames leverages this trend by enabling queries across these cloud stores. It eliminates downloading and moving files by:
- Reading directly from cloud store
- Grabbing only as much data as needed for the analysis
RasterFrames provides a broad enough functionality to integrate access with analysis; removing the artificial barrier and friction associated with data archival.
Analyze the Earth
Because RasterFrames puts EO data in a DataFrame, it enables data scientists and other analysts to analyze the data in a familiar way. DataFrames are general purpose and the basic operations on them are well known. RasterFrames lets the user take full advantage of that functionality in multiple programming languages such as Python or SQL. RasterFrames also adds custom functions for EO data in a way that seamlessly integrates with the existing DataFrame functionality, including a broad variety of machine learning modeling techniques.
Scale the Earth
The size of the datasets describing the Earth is mind-boggling and growing at an exponential rate. With RasterFrames, users can build and tweak their analyses or models locally, then scale them to thousands of servers in the cloud, without changing the code. In the EO space, this capability allows a data scientist to train a model over a limited geographic area (e.g. classify crops in Central Virginia), validate performance locally, then deploy the model over a much broader geographic area (e.g. classify crops nationwide). In practice, this hybrid approach saves users both time and money — a point that we’ll explore later on in this series.
For all the reasons above, we believe that RasterFrames can revolutionize the practice of Earth Observation. In the coming months, we will explore the value of its capabilities by providing a deep dive into the technology and its applications.
In the subsequent posts, we will:
- Examine RasterFrames from the perspective of a data scientist
- Bring users on a tour of RasterFrames, exhibiting its comprehensive feature set
- Illuminate the business value of RasterFrames, with a focus on scalable compute
- Dive into the architecture behind the magic, sharing what RasterFrames does in detail