Making Space Data Easier to Use: Overcoming Challenges and Expanding Access

Robert Simmon
11 min readSep 19, 2024

--

It’s currently easier than ever to explore the Earth from space. There are more missions (both government and commercial), collecting data more frequently, with a wider variety of technologies than ever before. The insights to be gleaned from the data are needed more than ever, storage is cheap, computers are fast, and the principles of usability are well understood. Unfortunately, accessing, reading, and deriving insights from this data is still a challenge. It requires experience, specialized knowledge, and customized workflows

Map of sea surface temperature for July 30, 2024. Cold ocean water (surrounding Antarctica) is near black, cool water (in the Southern Ocean and North Atlantic)is purple, warmer water (near the Equator) is orange, and the hottest water (in the Red Sea and Persian Gulf)is yellow. The map is in the Interrupted Goode Homolosine (Oceanic View) projection.
Sea surface temperature on July 30, 2024, from the Global Ocean OSTIA Sea Surface Temperature and Sea Ice Analysis. Map by Robert Simmon.

NASA itself says: “(Satellite) data are the most comprehensive and accurate resource for addressing many of the most important global issues, but these data often aren’t used to their full potential.” The remote sensing community — especially big research institutions — can and should do a better job of enabling users to exploit their datasets.

In this essay I’ll describe the obstacles I see currently inhibiting the widespread use of remote sensing data, and outline some solutions that I think would help expand use of these datasets, especially for people outside the established research community.

Data Access Challenges

The Earth observation & satellite remote sensing ecosystem is a complex web of protocols, formats, interfaces, data archives, and cloud services. Understanding how to navigate this network to use even a single dataset can be difficult and time consuming. A challenge that is magnified if you need to find, read, and analyze data from multiple providers across several scientific fields. Identifying and mitigating these challenges is a key step in realizing the potential of remote sensing data, especially in multidisciplinary research.

Data Access

Simply finding and downloading data is a challenge. Most web searches for satellite data don’t lead to the data, but to a data product page. Which itself is typically focused on an overview of the data, versioning, and attribution rather than access. Much of the language on these pages uses dense jargon, often further obscured by acronyms. Even pages that are specifically labeled “data access” have multiple options with confusing titles and no clear indication of which source is best for which application. Further, these links — even navigation leading from an individual dataset — usually link to the top level of a search tool. At which point the user, who has already expended time and mental energy to narrow down their search, has to start over.

Interfaces for data access tools tend to be one of two extremes: bare-bones or comprehensive. The bare bones interfaces — usually a list of directories and files — offer direct access, but are often hampered by cryptic labeling and may lack the functionality to download multiple files. The comprehensive interfaces offer not only data download, but also browse imagery, filtering, searching, subsetting, and often some amount of guidance intended for novice users. These features are welcome, but they often feel clunky and slow compared to similar modern applications like web maps. Topic-based browsing is well-intentioned, but often leads to hundreds or thousands of parameters to choose from. It can be overwhelming.

Once a user has navigated through these options, and selected the measurement they need for the time and place they’re focused on, they’re met with a final challenge: how to download the data? In the case of NASA Earthdata, the options are: click, one by one, through a list of files; undocumented “direct cloud access”; or a download script which requires the user to be comfortable with the command line. The other interfaces I’ve used differ in the details, but ultimately offer a similar experience.

Data Formats

Once you’ve located a dataset, the next challenge is to open it. Yes, Earth science data is easier to read now than it was 10 years ago. Common commercial visualization and analysis tools handle many scientific formats. Open source visualization and analysis tools (GDAL, QGIS, Panoply, scientific Python, R, etc.) are widely available, fairly well documented, and are easier to install and run than they were in the early 2010s. NetCDF — in particular those files that follow the climate and forecast (CF) conventions — is relatively straightforward. There’s a community of practitioners who write thorough guides, and a wealth of video tutorials on YouTube.

And yet.

Even the simplest files require esoteric knowledge to open. Installing the specialized software needed to read these datasets is often challenging in corporate or government environments that limit administrator access. Working on the command line is intimidating for an extremely large number of potential users. Many files that claim to follow conventions don’t, often in subtle and hard to decipher ways. Plenty of datasets are still only available in HDF, packaged in ways that require reformatting before they’re usable. And a few legacy datasets are archived and distributed in cryptic, bespoke formats that can’t be read by any off-the-shelf tools.

It’s possible to overcome these hurdles with training and persistence, but as a new user they can look insurmountable. Even experts need to spend time decoding and transforming new datasets.

Earth Observation Data in the Cloud

While cloud computing promises to ease some aspects of data discoverability, access, and analysis, the learning curve to use these services is steep. In many cases, leveraging these technologies requires the skillset of a full-fledged developer. Each cloud service provider has a unique workflow. Some mix and match elements from existing languages, like JavaScript and Python. Others may require proprietary technologies or limit exports. Even fundamental concepts like file hierarchies differ from the well-established conventions of desktop computing.

These types of obstacles introduce friction for new users, and require dedication and time to overcome. If you work for an institution that partners with a cloud provider, you may not have individual access to cloud computing resources. Further, the availability and business models of these services is not guaranteed. For example, Microsoft’s Planetary Computer Hub was shut down with only a few weeks warning.

Cloud computing enables new types of research and visualization, and provides a tremendous opportunity to de-couple exploitation of large datasets from an individual user’s access to computing power. But the current cloud paradigm relies on practitioners who already have expertise. Much of the work I’ve seen on cloud infrastructure for Earth science data is focused on the back end. Front end development is needed to make these resources approachable to new groups of users.

Solutions

I mention these frustrations not to complain, but to help motivate data providers to better meet the needs of data users. Especially those users outside the traditional remote sensing and web mapping communities, but I think there’s room to make data access and utility better for existing users, too. In addition to user-centered design, I’d like to see streamlined formats, flexible exports, elegant interfaces, curated datasets, visual workflows, and polished web apps.

Focus on Usability

My first (and most important) recommendation is to focus on user needs, rather than the requirements of the scientists creating the data. (I highly recommend Don Norman’s The Design of Everyday Things for an introduction to usability.) The principal investigators are always going to have the knowledge and the resources to access their own data. The role of a data provider is to make data accessible to people without years of experience and a dedicated team. This is especially true of government-funded data. The more data is used, the better the return on the investment. And, perhaps, the larger the constituency for a dataset, the greater the demand for maintaining and extending that dataset. For example, the Thermal Infrared Sensor was only added onto Landsat 8 at the last minute, after the agricultural community demonstrated the value of evapotranspiration measurements and demanded those bands.

When a (potential) user says “I find this difficult” believe them. I can’t tell you how many times I’ve mentioned that I find Hierarchical Data Format (HDF) challenging, and the response was “Well actually, it’s easy”. It’s not easy! (If you’re interested in learning how to read HDF files, I wrote a tutorial on using GDAL to read scientific data formats.) And I’m not the only one who finds it frustrating, just one of the most vocal. A better response is to find out why the user is having trouble, and then think about ways to make it easier for them.

Data Discovery

Where possible, online information about datasets such as instrument web sites or product descriptions should link directly to the data. These links should be clearly labeled and minimize the decisions and actions a user needs to begin downloading data. This could be as simple as a pre-filled search query, or an interface tailored to browsing a specific dataset. For example, NASA Ozone Watch provides summary graphs and tables, a matrix of ozone maps, and direct links to daily data (although I wish you could download matching data directly from each map). The Ocean Biology Distributed Active Archive Center has a Level 3 & 4 Browser with images and provides downloads of global ocean color data from multiple missions and time scales.

Streamline Data Formats

In the specific case of HDF, NetCDF, and GRIB I think existing standards should be re-examined for compatibility with commonly used software (i.e. make sure longitude runs from -180˚ to 180˚ rather than 0 to 360, and grid cells are referenced fro the corner, not center), datasets should be more standardized (especially mapped (Level 3) datasets), and conventions should be enforced across products. Ideally, users should not have to spend time decoding each satellite dataset they encounter.

Customizable Exports

There’s also no reason the formats used for data archiving need to be the formats used for data distribution. Make the storage as complex as it needs to be, but build an interface that accesses that data and exports it in a way that is easy to read. That doesn’t mean provide data in any format (there’s no need to export 32-bit floating point arrays as ASCII, for example), just a handful of carefully-chosen formats that suit the underlying data. GeoTIFF is my preferred file type for most raster data, but ASCII would be better for something like a time series.

Humane User Interfaces

Sometimes I think the scientific and developer communities are trapped in the era of terminals and mainframes, and forget Xerox PARC ever existed, or that MS-DOS is no longer the dominant PC operating system. The ability to work on the command line and know how to program (with a very specific tech stack) seem to be a prerequisite to work in remote sensing. Why? There’s half a century of research in human computer interaction and user experience design that says graphical user interfaces are better for most people! The Earth science community should be actively recruiting user experience designers and incorporating usability into the core of data center design.

Data Curation

For new users the enormous number of parameters available is as much an obstacle as it is an asset. A curated selection of essential variables would provide an entry ramp for people outside of the Earth sciences. Especially if these data were formatted in a consistent way, readable by the most common geospatial software (probably QGIS & ArcGIS). Perhaps even go further and provide summary tables suitable for analysis within spreadsheets or business intelligence software.

Visual Programming

For more sophisticated analysis, a visual programming language optimized for image processing, remote sensing analysis, and mapping could bridge the gap between the power of scientific computing languages and straightforward but limited graphical tools. Learning a scientific computing environment isn’t trivial, while graphical tools can be limited or expensive. For inspiration, look at Pure Data (for musicians), Maya shader networks (for 3D artists), and KNIME (for data scientists).

Leverage Web Apps

The power of cloud computing could also be re-focused from the back end to end users. For a variety of reasons, it’s often easier for someone to download data for local processing than it is to work on-line. Current platforms aren’t as polished as desktop tools, and if a user is comfortable with their existing workflow what is the incentive to learn a new one? Web-based tools may currently preclude the need to download and store huge volumes of data, but they will need functionality and interfaces that are at least as good as existing desktop tools to be widely adopted.

There’s also an opportunity to design web interfaces for emerging use cases like synthetic aperture radar, point clouds, and hyperspectral data that don’t have established workflows on the desktop. Well-designed, cloud-based tools for analyzing these types of data would help expand them from niche to widespread use cases. Further, once these users adapt to working in the cloud for one application, they would likely be more comfortable with online computing going forward.

Conclusion

I think Landsat is a good example of how easy access to data can grow the entire remote sensing ecosystem. From the mid-1980s through the 1990s Landsat had a near death experience. High prices for data led to low demand, requiring higher prices, further depressing demand. The situation was so bad that data was often not collected due to the lack of paying customers. This was partially rectified with Landsat 7 data — which only cost $600 per scene rather than $4,400 — but the significant increase in use didn’t occur until the data were made completely free in late 2008.

At which point Landsat data downloads exploded, quickly growing from thousands to millions of scenes per year. The number of applications for the data grew alongside the increase in data distribution. Novel uses include Google’s Timelapse, deforestation alerts from Global Forest Watch, and providing a calibration and geolocation reference for commercial satellite companies. According to a 2019 report from the Landsat Advisory Group, “The constituency for the data has grown from a narrow group of government scientists and academicians to a broad base of global operational agencies, NGOs, private companies, and citizen scientists.” Removing a barrier to access grew the Landsat user community from a few hundred to tens of thousands of people. As a result, Landsat data now generates billions of dollars of value every year.

Going from hundreds or thousands of dollars per scene to free is obviously an extreme case. But, compared to many other remote sensing datasets, Landsat isn’t that hard to find or use. Novices can interactively investigate in Earth Explorer, while expert and institutional users can access the full archive through Google Cloud Storage or Amazon Web Services Open Data. Scenes are delivered as GeoTIFFs, which can be read by almost anything, including design-focused tools like Photoshop.

Of course, Landsat data can be interpreted like a photograph, and measures phenomena at a human scale — properties which contribute to its utility and wide appeal. More abstract datasets would likely not have the same reach. But focusing on user needs and removing obstacles to adoption will enable new communities to explore the data, and bring fresh ideas and applications.

Despite there being more remote sensing data available than ever before, the process of accessing and using that data remains a frustrating and time-consuming task for many potential users. Existing data archives tend to focus on the needs of the research and tech communities, who are already well-equipped to use these data. Adapting a usability-first perspective would help new users — everyone from insurers and brokers to cartographers and reporters — benefit from these invaluable resources.

Ultimately, the success of a remote sensing mission is defined by the problems the data help solve. Spreading that data as widely as possible creates opportunities to find new use cases and applications beyond the research community.

--

--

Robert Simmon

Data Visualization, Ex-Planet Labs, Ex-NASA. Blue Marble, Earth at Night, color.