Public Lands Data Stewardship

18 min readJun 22, 2022

Late spring wild sunflowers in the San Juans

Following up from yesterday’s story, I decided to spend a little time exploring the linked data situation for a particular test case. My premise here is that a part of public lands stewardship could be data stewardship. We can consider data about our public lands to be a digital representation or manifestation of those lands, and I’ve long felt that data collected from or describing our public lands is equally important as a public trust resource.

From a Cornell University blog on Public Trust Practice (https://blogs.cornell.edu/publictrustpractice/public-trust-concepts/)

Public Trust Doctrine

Very early in my career as a civil servant, I received a lesson in the Public Trust Doctrine from my boss at the time. I was a young student field tech in the U.S. Fish and Wildlife Service. I was given some responsibility for drafting up sections of reports on our field studies, examining the impacts of environmental contaminants on fish and wildlife resources. I wrote some text where I used the phrase, “trust resources,” but ascribed them to the Bureau within rather than the Agency of the U.S. Department of the Interior. I had a great mentor at the time who took the time to explain how the government was structured, what the public trust was all about, and how we needed to document and report on what we discovered in our work as government scientists. That interaction set me on a particular course for a long and rambling career in doing what I can to uphold that trust.

The Cornell Legal Information Institute defines the public trust doctrine as:

The principle that certain natural and cultural resources are preserved for public use, and that the government owns and must protect and maintain these resources for the public’s use.

In a country and society where one of our highest ideals is that the government is to be “of the people, by the people, and for the people,” we are all stewards operating under the public trust doctrine. We certainly hire and pay people like BLM Range Conservation Officers, Park Rangers, Interpretive Specialists, and others to take on full time dedicated roles in the public trust. But every member of the public can and should play a part as both a user and beneficiary of the public trust but also a contributor. The very least we should strive for is being “net zero users” where we use resources in the trust without destroying or degrading them. But what are all of the ways we can be responsible stewards and trustees, leveraging the resources in the public trust in a way that creates a net gain for the planet and for our societies?

Public Domain Data Curation

As discussed in my article yesterday, there are quite a number of commercial interests competing for consumers in the data and mapping space. I don’t begrudge any of these interests at all, and I’m perfectly happy as one of those consumers to pay for a good product. Those investments often result in great innovation and powerful capabilities that make our lives better and contribute to human progress.

But there is also an important place for data in the public domain, meaning data that are completely free from any restrictions on use by anyone, including commercial interests. Government institutions in the U.S. are in a somewhat unique position in the world in that our Federal government (and many State governments) are prohibited from holding copyright. This means that nearly every piece of data and information produced at taxpayer expense in the U.S. is donated into the public domain. This supports a tremendous amount of activity across commercial, academic, nonprofit, and other sectors. Open data policies put in place across the last couple of decades have resulted in massive amounts of new and legacy government data being released “into the wild.”

The problem we have today though is that there is so much data and information out in the digital wild, that we’ve created an accessibility and usability problem. Raw data often lack sufficient context, semantic depth, or working applications and are basically useless until someone puts the effort into adding those elements. Artificial intelligences can often assist in that work by rapidly wading through massive amounts of data, but they always come along with uncertainties that have to be reduced.

One of the areas I’ve become very interested in is the digital commons, most notably represented in the “Wikiverse” of platforms, technologies, methods, and community. This also includes specific platforms like OpenStreetMap for geospatial data and information. New contributions pop up over time, some of which morph into the more long-standing digital commons infrastructure.

To a great extent, the Wikimedia Foundation and others have become Extra Governmental Organizations (EGO) in that they perform a very important role in upholding and building upon the public trust doctrine. Like government, they are dedicated to serving the public good. But they are not part of the nation-state apparatus, and in the digital world, they now transcend all national boundaries. The non-profit organizations that provide digital infrastructure through various funding sources are, of course, subject to all of the vagaries and pitfalls of human institutions and can certainly mess up and go awry. However, like government organizations, there are built in checks and balances when the “of the people” dynamic is continually upheld. If “we the people” are all an active part of maintaining and contributing to the digital commons, we help keep the human proclivities toward malfeasance in check.

Over the years, as I’ve worked on many types of scientific data in my day job with government, I’ve often thought that government institutions might be much better off contributing to the EGO-based digital commons rather than or in addition to always “rolling our own.” There’s still a solid role for the National Archives and the broader slate of Federal, Tribal, State, and Local government data infrastructure in ensuring the long-term viability of our digital assets. But the reality is that the work that has been done by EGO foundations and the army of volunteers they motivate and enable to build an interoperating and linked open digital infrastructure is often leaps and bounds ahead of governmental organizations.

Test Case: Stewarding Trail Data and Information

So, enough of the philosophy and onto more geeky stuff. I decided to work through a particular case of trails information with the following questions in mind:

What if, as part of spending time in and experiencing a part of our public lands as a volunteer steward, I put some effort into curating information about the features of that public land within the global digital commons? How straightforward is this process? Could it be extended to anyone with time and interest? What value would it add?

I focused initially on a specific hiking trail that I wandered down the other day from the Gunnison Gorge NCA into the Gunnison Gorge Wilderness. The Chukar Trail is only about a mile (though data on this varies as I’ll talk about in a bit) and is the upper access point into the gorge section of rapids on the Gunnison River. Commercial and private boat trips launch from the access point at the bottom of the trail, with commercial companies mostly using mule trains to haul gear down the 500+ ft descent. It’s a pretty easy to moderate trail on foot, and I hiked it on Monday this week. I only picked up a single piece of micro-trash, so it’s a pretty pristine trail all together.

It was an interesting journey poking around online this morning to understand the digital footprint of the Chukar Trail and experiment with curating some of that information for public domain use in the commons. Following an excellent example I mentioned yesterday for the Canyon Rim Trail in the Colorado National Monument, I crafted a new Wikidata entity for the Chukar Trail. I created crosslinks with the existing OSM way for the trail, and I’ll need to follow up with some actual mapping work to make the geospatial data more accurate.

Along the way, I discovered some new resource pools I hadn’t known about. BLM advertises another commercial mapping app, Avenza Maps, that was new to me. It’s yet another source of mapping data operating in the commercial space that seems to be geared toward business-to-business relationships with map makers. There’s an app and a free tier, so I’ll check it out further. By running a search in Google Maps, I came across gjhikes.com, a very information rich site for certain trails in my area that I strangely hadn’t yet encountered.

I put some of the information I gathered into the Wikidata entity, but I can see some additional work on this in future. Here are some observations and thoughts about the digital footprint situation for this trail that I suspect will extend into other cases.

More work will need to be done within the Wikidata schema and conventions if it is to be useful as THE database foundation for driving apps. Properties are sometimes a conundrum in Wikidata. They are often created with a particular context in mind and may not always extend into slightly tangential circumstances. For instance, following Thierry Caro’s lead, I used the degree of difficulty to flag the Chukar Trail as moderate. I referenced this to the AllTrails source, which uses the same word. My own experience would make it mild to moderate, though the GJHikes source lists it as “strenuous,” which is not yet contained in Wikidata, with a need to pull this concept in from Wikipedia. Ideally, the curation for something like this would include both “moderate” and “strenuous” with different references and perhaps qualifiers for each entry, allowing a consuming application to determine which sources/circumstances best meet its purposes.

Some of the other semantics in Wikidata are even more challenging, pointing to one of the big arguments I’ve heard against trying to leverage the platform for “serious” uses. The Wikiverse is a wild and wooly place where stuff comes in from everywhere with not always rigorous oversight and peer review. But then, there’s also the issue of constraints that the meritocratic system at play within Wikidata have imposed. For instance, one of the really important pieces of information we’d want to include in the schema for trails is the type of activity supported/allowed for that feature. I initially classed the new Wikidata entity as a hiking trail but ended up moving it to the higher class of trail, which is itself a subclass of path. The semantics here get really messy, and different understanding of terms means that the folks who apply constraints within the system sometimes make mistakes that inhibit the curation of knowledge. There’s a current constraint that means I cannot add activity or human activity statements to the trail entity like hiking and horseback riding, the two major activities authorized for that trail into a wilderness area, or ATV riding, drone flying, and other unauthorized activities. To accomplish properly organized and well curated information for this use case, we would need to do some additional work like adding subclasses for authorized and unauthorized human activities and fine-tuning the constraints to allow these statements to be captured.

The fundamental Wikidata model of statements with references and qualifiers is incredibly powerful and could allow for much richer information than is often captured in purpose-build platforms like AllTrails. For instance, if we resolve the activity statement problem, we could include fishing as an activity associated with this trail with the qualifier that it occurs at the lower terminus of the trail. Slightly tongue in cheek (but maybe not), we could include urination and defecation as human activities supported by the pit toilets at either terminus of the trail. This kind of fundamental, well-organized and attributed data could provide a powerful foundation for many types of applications.

The geospatial part of the data picture is interesting. The OSM source for the Chukar Trail is not accurate as the trail stops well before intersecting with the Gunnison River, which is the actual lower terminus of this out-and-back trail. AllTrails, COTrex, and other sources have more accurate line data. More exploration is needed to work through the Wikidata part of curating information in relation to the node/way structure of OSM. We have concepts in Wikidata for trailhead and boat ramp that could be used in the Chukar Trail case to describe the upper and lower termini/nodes of the trail, but there’s additional work to be done in the classification and descriptive semantics for these concepts (e.g., “river put in” might need to be a subclass of boat ramp or a separate type of feature).

I found it interesting that the GJHikes source was one of the “Web Results” provided by Google Maps for the Chukar Trailhead, with the trail itself not part of the Google dataset. This might have come about because GJHikes uses custom Google Maps as its map data component or they did some decent SEO work. The different representations and derivations of the geospatial feature itself does bring up some interesting things to think about in terms of synthesis and integration of these through time. Platforms like TrailForks, COTrex, and others that provide for actual user-contributed recorded GPS tracks to be contributed could create an avenue for iterative data improvements and even interesting signal detection (e.g., consistent routing around deadfalls and other obstacles, authorized or not).

Ethical and Practical Considerations

With platforms like AllTrails, REI’s HikingProject (that interestingly does not include my test case trail), and others working in this space, we might be hard pressed to find volunteers willing to put time and effort into public domain data curation like I explored here. I submitted some suggested attribution additions to AllTrails for the Chukar Trail, and it’s certainly a whole lot simpler to do things in that context than it is struggling through the semantics and other issues in Wikidata to capture the same information.

Role of Government

Rich information about the resources on our public lands should be in the public domain, unencumbered by use restrictions and fully accessible to anyone, including commercial and noncommercial innovators who want to build on that foundation. Government organizations should make every effort to put the data and information that they are most directly responsible for out into the commons in a way that it can best be combined with other public domain information contributed from other sources. While I applaud the USGS effort to assemble an integrated public domain trails dataset, I’m frustrated with how the dataset has evolved. The major focus of that effort has been to develop analytical tools on connectivity between jurisdictions and access to parks and public lands. That’s all well and good, but there are some significant problems in the current implementation.

The provenance trace from the released National Digital Trails Dataset to original source data is sorely lacking with no record carried through for source identifiers that would allow for backtracking or alternate derivative pathways. This problem is evidenced in my test case with the USGS dataset having a “Chuckar Trail” in the same track as everyone else has the “Chukar Trail.” We have to look at the “raw” data and metadata from the USGS data product to try and figure out how this error came to be in the data.

The data record available in the staged product download for the National Map’s Transportation Layer (I honed in on just the Colorado subset) gives us an obscure UUID value for “sourcedata” along with a text string indicating “BLM Trails 09/2021.” Reading between the lines, it seems that USGS considers the BLM source for trails data on BLM managed lands to be the “predominantly authoritative source” in this case. We also have a date clue in a text string, which might help us narrow in on what we’re looking for.

I had to do quite a bit of digging to track down a source, which I still have no way of determining as the actual source material USGS is using. BLM operates what they call their Geospatial Business Platform Hub via Esri’s ArcGIS Online platform. A little digging around through “transportation,” which isn’t entirely intuitive, finally led me to one of quite a few datasets that have trails information, all organized into various line-of-business type of packages that likely reflect how BLM distributes responsibilities. I explored the “BLM Natl GTLF Public Managed Trails,” which I gather is a derivative/subset from some master dataset for “BLM Ground Transportation Linear Features” (total GIS speak!). After orienteering through the map because the search is just a basic geolocate deal, I tracked down a source for the errant spelling of my trail name. The actual “cited” source from BLM in the metadata records for the USGS products points to BLM’s “Landscape Approach Data Portal,” where you can trace through to pretty much the same products with a slightly different representation.

So, here’s where the real problem is. I certainly applaud the USGS effort to make sense of what is obviously a currently messy “authoritative” data problem where organizations like the BLM have their own particular context and drivers for producing and managing data the way they do. We need different syntheses across all of those to produce useful amalgamations that can be repurposed into new contexts. However, in doing that, we really need to record and make available full provenance traces without ambiguity. We need to know how BLM’s “ROUTE_PRMRY_NM” attribute from some specific dataset came to be USGS’ “name” attribute in a downstream integrated derivative. Exactly how and from where and when was that information derived?

Perhaps, the provenance trace and processing steps need to become more the point of what USGS and any other government agency producing data products is putting out. The dataset end products may be the primary point of use and interface, but the product itself should be the robustly documented and preferably executable process to get to that point. For a product like a national trails dataset, it might also be wise to consider “authoritativeness” as a factor of proven use and demand over logical source. Is it a massive problem that the BLM and now USGS have a misspelled trail name in their databases? No. Is it still a problem and indicative of what is perhaps a systemic point of failure? I would argue, yes.

If the process for producing an integrated/amalgamated product like national trails was equally or more important than the end result, we might logically think about building in some automated consistency and other quality checks. Quite a few US States beyond Colorado have invested significant resources through their outdoor recreation and tourism pursuits to build things like COTrex with pretty robust underlying datasets. AllTrails and other commercial ventures along with some of the not for profit platforms I’ve mentioned have lots of information about the same features. Some of these sources are “programmable” in some way, lending themselves to building at least high level processing (e.g., name/proximity checks). It would be fairly straightforward to identify a possible/probable misspelling like Chukar (the bird) to Chuckar (the I don’t know what).

If we take things one step further and think about changing mindset to one of projecting usable data and information into the commons vs. asking the commons to come visit the government, that might also result in more accurate and usable information. What I’m contemplating doing here is writing a bot to establish “Wikidata stubs” for all of the named trails that I can reasonably pull together from a data gathering process across accessible sources. By “reasonably,” I mean to a degree of confidence that I am not introducing garbage into the knowledge bank. In the process of looking through OSM and a handful of other public domain or open license datasets, I can already tell I’ll end up with quite a number of “Chuckar-like” situations that will take further human sleuthing. But what if government agencies started doing some of this as a value-adding contribution into the commons?

Could government produce higher quality products by inviting a level of open public review and scrutiny through pushing their data products into a context that the public owns (.org) more so than what their government owns (.gov)?

Role of Volunteer Curators

Despite some of the challenges I discussed earlier in the Wikidata semantic inconsistencies and shortfalls, it still is the most powerful and growing platform to rally around in terms of an open knowledge bank founded in the digital commons (not to be confused with the thing that Elsevier appropriated and branded). It’s founding on the linked open data model is extensible and long-lasting. The Wikimedia Foundation’s branches into the Wikimedia Commons and other parts of the Wikiverse along with their mostly collegial relationship with the specialty community in OpenStreetMap is developing a well-rounded suite of interoperating digital infrastructure to manage and provide all manner of curated data, information, and knowledge content. Adherence to open standards and transparent technology will continue to support new innovations in interfaces, alliances with other platforms, and contextual applications.

My brief foray into curating content around a single trail will be followed by more extensive testing and use case bounding. I’m sure I’ll find some additional challenges beyond those I’ve noted here that will take some work within the Wikidata, OSM, and other communities. I’m quite a ways off from building a bot to lay out some named trail record stubs and start integrating properties, but I can see it on the horizon.

I plan to start with a recipe and set of guidelines/conventions for describing trails and aggregating data from multiple sources. As with a lot of these kinds of problems, there’s a fair bit of curatorial plumbing that has to be laid down and a coverage of data before anyone can start working up applications. I’ll focus efforts into the three NCAs where I’m connecting with a volunteer community so we can explore local application ideas.

At some point, I anticipate that we’d want one or many applications that bring everything into a point of use where context is defined and narrows the choices to what’s relevant. Very few people are going to geek out enough to go start creating or editing Wikidata entities, OSM ways/nodes, and whatever else we leverage. I would absolutely love it if AllTrails or someone who’s already built out apps and a user community latched onto the idea of at least a part of their data infrastructure moving to the commons, based in Wikidata or some combination of platforms. I would hope that could be a two-way street, with user contributions fed back to the commons. Commercial companies can obviously decide what they can monetize and deal with that content separately, but it would be great to see a commercial venture put a part of their investment into building the commons knowledge bank.

Ethical Guardrails

There are a number of ethical considerations with data on our public lands. Making more and better information about features like trails for public use accessible could be considered antithetical to protection. There are certain sacred places for me personally that I would never share, and there are cultural and ecological resources that need care and safeguarding.

I consider all of our public lands in the U.S. to be co-owned and co-managed with the first peoples of this area. I’m guided by the principles of indigenous data sovereignty, considering the digital representation of resources on our public lands to be properly co-owned and co-managed by Native American Tribes. In my test case of the Chukar Trail and its new Wikidata entity, I’m experimenting with adding statements about the area as the ancestral homeland/territory of the Nuchu (Ute) and Diné (Navajo) people based on the Native Land mapping work. This is another place where entity classification constraints in Wikidata are currently coming up short.

Wikidata entity showing type constraint problems in associated a trail with Native American ancestral homeland

We have cases of trails throughout the country that have history with the Tribes, some of which has been written down in various “interpretive” materials but much of which is not yet linked. Where appropriate, I would love to develop relationships that help to bring more of this history to light to help deepen our connection with the land and its importance to human development through time. I can imagine all kinds of interesting ways this could play out with data within the Wikidata model. We might record dynamics of temporally bounded human uses/activities for a trail or other feature. We could encode treaty rights into digital representations, indicating things like hunting or herb-gathering activities qualified to the Tribes who hold those rights. How could that information, encoded as knowledge, help to develop applications that improve equity and enhance responsible use?

I can’t deny, though, that many of us humans go out to our public lands and simply take with no thought to our impact or giving back more than we receive. The area I’m focused on here in the Gunnison Gorge NCA is very popular for motocross with scars all over the Mancos Shale landscape from bikes high-lining the slopes, kicking up more dust to melt snowpacks earlier and eroding selenium-laden soils to cause downstream impacts. While I don’t share in those activities personally, I can understand the appeal, and it’s all part of BLM’s multiple use mandate. I guess it comes down to this question:

Will making more and higher quality information about features on public lands like trails have a net positive or a net negative impact on the sustainability and long-term ecosystem services from those lands?

My hope is that it will have a net positive effect. I hope that a platform which enables more people of good will and intent to think critically about and enter into more of a symbiotic relationship with our remaining open and “natural” spaces. I hope that a usable linked open data framework with well-rounded content will enable app builders to help educate and engage public lands users in developing creative solutions and practicing uses that lead to sustainability and positive ecosystem health trends. I hope to foster a deep rooted understanding and sense for more people in the public trust doctrine and our ability to be trustworthy stewards.