Extending the Global PID Graph with Non-Persistent IDs

PIDs Are Not Silver Bullets

Luc Boruta
Nov 4, 2019 · 3 min read
Image for post
Image for post
Simplified representation of URI transmutation in Cobaltmetrics: when presented with an identifier, our API automatically collates citations to all PIDs and URIs known to identify the same resource.*

Nothing lasts forever on the web. Persistent identifiers—a.k.a. PIDs—are designed to slow down link rot for both scientific and non-scientific data. Guiding principles for scientific data management like the FAIR Principles or the Metadata 2020 Principles directly or indirectly advocate the use of PIDs.

We wholeheartedly support that, but persistence is purely a matter of service, and PIDs are not silver bullets. There are billions of research objects that will never be assigned a PID — e.g. works published before the advent of DOIs, and most of the works that fall under the grey literature label — and objects that were assigned PIDs are not necessarily cited with mentions of these PIDs.

PID Fixation

The web is not FAIR, and this is important in the context of scientometrics. Metrics are a sampling game: selection biases are an issue, and imbalanced datasets reinforce discrimination. Moreover, we think end-users cannot be expected to know whether a given identifier is persistent, or whether a given URL is canonical. Citations are also sometimes hidden behind indirection mechanisms like short links and proxy URLs, and different databases will use different identifiers for the same object.

With that in mind, can metrics based on data collection efforts that only or mostly track PIDs ever be inclusive and fair—the regular kind of fair—no matter how extensive their coverage of the scholarly web appears to be?

Extending the PID Graph

With Cobaltmetrics, we consider that tracking PID citations is not enough, and that other identifiers and hyperlinks are also valid citations, including but not limited to web pages—e.g. landing pages on publisher websites—and non-canonical identifiers like short URLs or proxy URLs from services like EZproxy or Sci-Hub. In order to achieve that goal, we index every PID or URI that is mentioned in the sources that we monitor. Then, when our users query our citation index, our URI transmutation API automatically collates citations to all PIDs and URIs known to identify the same resource, whether the cited resource was assigned a PID or not, and whether the citing resource used that PID or a non-persistent ID.

In practical terms, the URI transmutation API combines PID-to-PID mappings, PID-to-URL resolvers, and—my favorites—URL-to-PID unresolvers. The knowledge base that is produced is a very large but simple graph with a single relationship between nodes, namely identifies the same resource as, something similar to yet less strictly defined than owl:sameAs.

In that regard, our knowledge graph is a natural extension of other scholarly graphs like Research Graph or FREYA’s PID Graph. These graphs are extremely important for the future of research on research, but they focus on heavily curated—and thus, from our point of view, idealized—scholarly metadata. Cobaltmetrics adds an interface that reduces the friction between the PID-centric scholarly web and the web at large, that is merely regulated by the HTTP(S) protocol.

Image for post
Image for post

Initiatives like FREYA, PIDapalooza, or Research Graph advocate the adoption of PIDs for all scholarly entities and, again, we support that. However, as long as users copy-paste non-persistent IDs from the address bar of their browsers, and until PIDs become the default, our mission is to ensure that entities that were not blessed with PIDs can still be linked with the rest of the scholarly graph. Drop us a line if you want to co-organize the first URLapalooza!

Interested in learning more about Cobaltmetrics? Try it out, check out the public API, join our newsletter, and reach out at contact@thunken.com!

Thunken

Reflections on data, science, and data science.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store