Introducing EarthAI Notebook
A New Environment for Geospatial Analytics
Geospatial intelligence is evolving at a new pace. As machine learning has advanced to the point where insights can be reliably derived from raster data, the limits imposed by human-driven, single-file-at-a-time workflows are being blown away by new methods of extracting insight at an unprecedented scale. New platforms are emerging as the legacy paradigm of desktop-centric Geographic Information Systems (GIS) confronts the new paradigm of cloud-native machine learning. In this post, we will explore the history of analyzing Earth-observing raster data and present our newest contribution to the geospatial community.
The Legacy Paradigm of Geospatial Analytics
Dating back to 1959 when the Central Intelligence Agency and the Air Force launched the Corona missions, analysis of satellite imagery has been the domain of the imagery analyst. Henceforth, well-funded government agencies financed large teams of analysts to develop geospatial intelligence by labeling features on images. Using early GIS tools, these large teams of humans generated features like buildings, roads, and bridges, that guided military and non-military decision-making. While this practice continues today, cloud computing has finally provided the computational power necessary to enable machines to generate these insights.
Enter Machine Learning.
In the past few years, remote sensing analysis has undergone a renaissance. The convergence of developments in computer vision, cloud computing, and Earth-observation sensors has unlocked new veins of intelligence that had been locked away by the legacy paradigm.
What once required a great deal of analyst time and resources can now be accomplished quickly and cost-efficiently by machines in the cloud. This is not to say that it is easy — merely possible.
The New Geospatial Paradigm
Early applications of machine learning focused on domain-specific use cases served by information products and consulting services. Specialized firms evolved to serve these use cases. Leveraging recent developments in advanced cloud architecture and machine learning combined with specific domain knowledge, they created information products that provided a competitive edge.
Largely, these use cases have come from industries that could afford to invest significantly for the chance of gaining this edge over the competition. Asset management and agriculture, industries where a small informational advantage on the price of a stock or the yield of a farm make a large bottom-line impact, have been early adopters of new geospatial technology.
While these early use cases have proven that Earth-observation imagery can provide a valuable source of information for decisionmakers, they have not yet solved the challenges presented by the legacy paradigm. In fact, they have introduced others.
Computer vision algorithms are greedy. Requiring hundreds to thousands of examples to train object detection algorithms, imagery analysis can quickly overwhelm an individual computer or server.
Furthermore, training neural network models takes a tremendous amount of computational resources before they are sufficiently accurate. Because raster data are intrinsically large and computer vision algorithms require so much computational power to train, the only practical way to address these challenges is through cloud computing and Graphics Processing Unit (GPU) technologies.
While widely available from numerous companies, cloud computing remains very complex. Savvy engineers spend countless hours configuring and managing computational clouds to achieve the performance required to manage and process imagery data. This specific challenge is likely the most significant barrier to leveraging computer vision. While data scientists seeking to apply machine learning algorithms to traditional tabular data can quickly and easily scale their analytics, the tools required to process imagery at scale have matured far more slowly.
EarthAI Notebook
We founded Astraea three years ago with an original goal of building information products. We quickly realized that, while the market was focused on scaling one use case at a time, higher-order problems remained that prevented the broader community from unlocking insights from imagery at scale. As such, we made it our mission to democratize access to geospatial data. We believed at the time, and now know, that this data contains the keys to some of our planet’s most intractable problems. Further, we know the most efficient way to unlock change is to give power to the end-user.
With that goal in mind, we set out to develop EarthAI: a fully integrated, cloud-native platform to enable experts and non-experts alike to leverage the full power of Earth-observing data. Our platform provides a suite of products focused on removing the complexities of discovering, processing, and analyzing Earth-observation data at scale. In July 2019, we released Earth OnDemand, which enables users to access and explore over 8PB of free public satellite imagery in an intuitive interface.
Now, we’re releasing the next product on our roadmap: EarthAI Notebook. Notebook is a hosted JupyterLab environment fully loaded with geospatial analysis libraries and API access to the Earth OnDemand catalog.
At the core of EarthAI Notebook is RasterFrames, our open-source library that abstracts away the idiosyncrasies of raster data, enabling data scientists to handle imagery the same way they handle conventional tabular data — in a data frame. Further, RasterFrames extends the capabilities of Apache Spark to raster data to provide scalable compute capability and access to spark.ml machine learning algorithms. While written in Scala for computational performance, RasterFrames has Python bindings to offer an environment that is widely accessible to the data science community and compatible with common tools like scikit-learn and TensorFlow.
Perhaps the most significant advantage of EarthAI Notebook is the ability to seamlessly scale an analysis. Our push-button provisioning simplifies cloud infrastructure, allowing users to access cloud resources with a single click (GPUs coming soon!). With access to Earth OnDemand, EarthAI Notebook comes “batteries-included” with free satellite imagery from NASA and ESA. Now, data scientists can spend less time searching for data, less time waiting for results, no time on DevOps, and more time on generating insights from pixels.
At Astraea, we believe the answers to some of the world’s most important questions reside in imagery data. The EarthAI platform was built to democratize access to earth observing data and enable a much broader segment of users to unlock the answers to these questions. We are removing the barriers of data access, cloud computing, and insufficient analysis tools to empower the curious. We hope that EarthAI Notebook enables a broad community of data scientists, analysts, and developers to begin working with satellite imagery.
To learn more about EarthAI Notebook, visit our website or start a free trial (no credit card required).
Written by Dave Yoken & Jamie Conklin
Astraea.earth