Mouth of the Copper River, using Global Shoreline Datasets and Global River Width from Landsat (GRWL) dataset, part of the GEE Community Datasets List

Community Datasets & Data Commons in Google Earth Engine

Samapriya Roy
Geospatial Processing at Scale
5 min readSep 30, 2020

--

Samapriya Roy and Erin Dawn Trochim

Have you ever spent days uploading a dataset to a cloud-based platform only to find out that it was already available — you just didn’t know it was there? Or bypassed data entirely because it was too large and complex to work with? How could you answer problems better if you spent more time actually synthesizing data and less time moving it ?

Elinor Ostrom, a Nobel Prize-winning Economist in Governing the Commons worked on core ideas on how commons across different societies are managed by those who build and are a part of it. These principles are the foundation of open-source software (such as the Linux foundation projects) or to those building an entirely open encyclopedia such as the Wikimedia foundation. This project or list was born as an outcome of the same.

With the advent of browser-based and remote sensing tools like Google Earth Engine, this has been a reality in terms of the community contributing to tools and scripts. An ever growing catalog of public datasets has also benefited from the many voices advocating for new additions. Currently, the greater need exists in applying this principle to community projects large and small.

Shared norms that contribute to digital commons are valuable to reducing overhead from downstream users.

While we find projects such as those listed in Google Earth Engine Community Pages, or Awesome Google Earth Engine lists, many more are constantly added every single day. Sam maintains a weekly list of catalog level datasets here.

Really, this is an old idea that most of us have contributed to in many forms. GEE users have been creating, uploading, managing and curating datasets to be shared within the community for some time. We believe this will be most useful for datasets before they are formally ingested into the Google Earth Engine Catalog or as an alternate pathway to making publicly available datasets available and accessible. We want to support making data findable and usable. Through experiments like the geeadd search tool to query within the public catalog we hope to extend this to user generated data commons.

geeadd search tool

Sam experimented with getting some community datasets ingested from Facebook’s High Resolution Settlement Layer the process described in the earlier article High Resolution Settlement Layer in an earlier article. Our work together in time then led us to also ingest the Global Shorelines dataset which was a massive effort considering the number of vertices and this grew out of such experiments on adding what was needed for better research.

Curating and Creating the Google Earth Engine Awesome Dataset List

You can find the Github repo and current list here. This is one step beyond archiving data on a personal website or within a data center as mandated by many funding agencies. For example: You should still start by having a DOI associated with the data either by publishing it as a part of a peer-reviewed paper or by using a service like Zenodo. Then, regardless of where you created your final product, you would upload your dataset details into our digital commons list and citations are included.

Even if you aren’t the dataset author, you can still suggest products because you are using them within your project. Creating shared cloud-based assets saves everyone time and effort. Examples of this include basic information on shorelines and rivers. Many projects use simplistic information on shorelines or duplicate efforts by uploading their own version of river centerlines and extent. Using data which has been systematically generated globally has major benefits, especially if it was derived using modern techniques. This improves the quality of our science.

The idea of pooling these datasets together meant first choosing the ones that a few of us had either uploaded or preprocessed to some degree, verify the license type, the source, the dates and so on. This was included in the way in the template we created for other users to submit datasets to be added to the list of existing datasets.

Steps to Make a new dataset add request

  • As stated earlier we already created a template for submission. You submit the request for any dataset that you might want to be curated by creating a GitHub issue here. Click on Get Started.
Create the issue and click on Get Started
  • This creates your dataset request as an issue and as we add more collaborators to review these requests we can keep on resolving and adding more datasets. Building around the idea that it is community curated and is something for the community and by the community.
Example template use to add your dataset
  • Once the issue is submitted with the template details we will reviewed and add it to the growing list of datasets. If you feel like updating anything you can either reopen an issue if it has been closed or you can create a Pull request submit the changes and create a merge request too.

Extra Bits and Cleanup

So now that you have uploaded these datasets you can also do some cleanup in Google Earth Engine itself. Check out the new descriptions box that they have added for images and image collections. This can be written in simple text or as markdown to add things like headings and bullets. This can also ensure that you can add descriptions along with citations and license information.

Use the description tool in Google Earth Engine to add dataset descriptions (Markdown is supported)

Add the badge to your dataset by using the following in your GEE dataset description

![contributor](https://img.shields.io/badge/gee--awesome--datasets-data%20commons%20contributor-green)
Contributor Badge add it to your projects

You can also make it link back to the GitHub repo and the awesome datasets list project by using this instead

[![contributor](https://img.shields.io/badge/gee--awesome--datasets-data%20commons%20contributor-green)](https://github.com/samapriya/awesome-gee-community-datasets)

There you have it your dataset will be reviewed by a community of users and we hope that this effort will grow. We are excited for it to help data get used, track it more readily and advance the community’s ability to tackle new challenges.

--

--

Samapriya Roy
Geospatial Processing at Scale

Remote sensing applications, large scale data processing and management, API applications along with network analysis and geostatistical methods