Creating Value with Identifiers

Why identifiers are essential to secure cross border trade, and how we have applied best practices and design principles

Nis Jespersen
Transmute
13 min readMar 21, 2024

--

Transmute enables global trade digitization. Essential to this mission is facilitating an optimal digital identifier strategy for our customers, in which our Verifiable Data Platform plays an important part. In this white paper we discuss some points of inspiration and the design choices and principles we have made to bring efficient, safe, intuitive — and beautiful — identifiers to the supply chain sector.

Inspiration

In designing our identifiers, we have looked to major internet players for inspiration: HBO, LinkedIn and GS1.

HBO

HBO is a great source of inspiration, consistently following identifier design practices. For example, here is the URL for accessing the movie Dune:
https://play.hbomax.com/player/urn:hbo:feature:GYUjdLgBiJp5otAEAAAAJ

This URL consists of:

  • Scheme (https), indicating that there is a resource resolvable over secure HTTP
  • Host (play.hbomax.com) with HBO’s domain name
  • Path of HBO’s video player (player), available to authenticated HBO users
  • Sub-path with a unique identifier of the movie Dune (urn:hbo:feature:GYUjdLgBiJp5otAEAAAAJ)

We will dissect the structure of the identifier in a moment, but first it’s important to note the same identifier of Dune is used elsewhere to resolve different resources. For example:
https://www.hbomax.com/dk/da/a/param-feature/urn:hbo:feature:GYUjdLgBiJp5otAEAAAAJ

Which links to a publicly available “movie poster” page, in this case on a Danish microsite as per the path.

Looking closer at Dune’s identifier, urn:hbo:feature:GYUjdLgBiJp5otAEAAAAJ, this is a URN consisting of:

  • Scheme (urn) , indicating a unique but not necessarily resolvable identifier
  • Namespace, NID (hbo) , which arguably should be registered with IANA, but isn’t
  • Namespace-specific string, NSS (feature:GYUjdLgBiJp5otAEAAAAJ), which notably consists of a categorization of what is identified (feature:) as well as a unique string

LinkedIn

LinkedIn follow a very similar pattern as HBO for linked in posts:
https://www.linkedin.com/feed/update/urn:li:activity:7062908400073887744

Again, the URL is comprised of:

  • Scheme (https),
  • Host (www.linkedin.com)
  • Path (feed/update)
  • Sub-path with a URN identifier (urn:li:activity:7062908400073887744)

And again, the identifier is a URN constructed as:

  • Scheme (urn),
  • NID (unregistered) (li),
  • NSS (activity:7062908400073887744), consisting of a categorization of what is identified (activity:) and a unique string.

LinkedIn also allows for natural identifiers for identifying people (https://www.linkedin.com/in/nis-jespersen/) and companies (https://www.linkedin.com/company/transmute-industries). There are two interesting points to note here:

  1. The use of natural keys (nis-jespersen and transmute-industries). When this is possible, humanly recognizable keys derived from the business domain is a strong feature.
  2. A more questionable part of this design is that LinkedIn is not sticking with URN identifiers. A better and more consistent design could have been for example: https://www.linkedin.com/company/urn:li:company:transmute-industries

GS1

GS1 governs identifiers for elements used in trade such as product types (GTIN), organizations (GLN) and transport identifiers (SSCC). These can be extended, for example a GTIN with a lot number traditionally looks like this: (01)09520123456788(10)ABC123 where (01) and (10) are codes for GTIN and lot respectively, followed by their values, in combination forming a unique compound key for the lot.

GS1’s Digital Link standard defines how the GS1 identifier must be represented as URIs. The Digital Link of the above example could be https://id.gs1.org/01/09520123456788/10/ABC123 (this is the so-called canonical form). Importantly, it could also be https://brand.example.com/01/09520123456788/10/ABC123, allowing the brand owner to host resolvable information about the product.

Per the spec, identifying properties are on the path, whereas non-identifying properties are placed in query parameters. Weight, for example, is non-identifying, so the Digital Link for (01)09520123456788(3103)000195 where (01) codifies the GTIN followed by its value, and (3103) codifies Net Weight and 000195 is 195 grams, can be https://brand.example.com/01/09520123456788?3103=000195.

Peculiarly, GS1 has not opted to leverage the registered “epc” URN namespace in the Digital Link standard. The above GTIN could be represented like this:
https://brand.example.com/urn:epc:id:gtin:0952012.3456788.

Granted, the QR code for this would be bigger; an important argument. Apart from that this seems like a missed opportunity to spread a best practice identifier pattern broadly into GS1’s established user base.

In conclusion of this little tour of inspiration, we’ve found that HBO and LinkedIn have taken practically identical approaches to identifying and localizing their platform resources. GS1 seems to have been on a similar path before changing direction for Digital Link. Next, we will look into some of the drivers which have led HBO and LinkedIn to the same best practice design.

Design Principles

In this section we will focus on properties which make a good identifier. We will also consider some properties which make them bad, if not dangerous.

Why does HBO bother adding urn:hbo:feature: and not just a “naked” identifier such as
https://play.hbomax.com/player/GYUjdLgBiJp5otAEAAAAJ

And, similarly, why do we consider this to be well formed:
https://platform.transmute.industries/presentations/urn:transmute:presentation:bdb8a296-ed17-41a6-b5e6-46n157747bee

But not this:
https://platform.transmute.industries/presentations/bdb8a296-ed17-41a6-b5e6-46n157747bee

Unique Resource Identifier

First of all, bdb8a296-ed17–41a6-b5e6–46n157747bee is not a URI, whereas urn:transmute:presentation:bdb8a296-ed17–41a6-b5e6–46n157747bee is.

There is a caveat, formally the URN namespace should be registered with IANA such that they can guarantee uniqueness. Our stance is that our use of the unregistered namespace (transmute) is sufficiently unique; a stance which we share with the likes of HBO and LinkedIn.

Usability

During development and integration, engineers often juggle identifiers from various platforms at once. Hints about which platform and type of resource the identifier is associated with makes it much easier to work with. As you glance over a table of identifiers, being able to pick out the one you are looking for is a time saver. Platforms often prefix identifiers with such context, in our case for example transmute:presentation intuitively and clearly point towards the origin and type of resource.

A developer should also intuitively trust that an identifier is unique. While UUIDs are a little long, they are recognizable and a developer instinctively comprehends the identifier’s uniqueness.

Finally, these aspects should come naturally. In working with identifiers, the good experience is the one you don’t experience.

Casing Convention

The natural keys employed by LinkedIn were kebab-cased (nis-jespersen). This is preferred because most operating systems and search engines treat dash-separation as separate words. In contrast, both camelCasing and snake_casing are generally considered one big word. Kebab-casing is also considered more readable than the alternatives.

Location Decoupling

The decoupling of the resource and its location(s) allow us for example to get Verifiable Credential data with
https://platform.transmute.industries/credentials/urn:transmute:credential:9ni5070c-102f-46cf-9e40-82de4874fdb6

And verify it with
https://platform.transmute.industries/credentials/urn:transmute:credential:9ni5070c-102f-46cf-9e40-82de4874fdb6/verify

This is similar to how urn:hbo:feature:GYUjdLgBiJp5otAEAAAAJ was used to retrieve both the Dune trailer microsite and the full movie player.

Security

Without this decoupling, there is just one identifying URL, for example:
https://platform.transmute.industries/presentations/bdb8a296-ed17-41a6-b5e6-46n157747bee
which means the entire identifier string is now a WHATWG URL, not an IETF URI, in turn allowing for UNICODE encoded internationalized domain names.

It is Transmute’s position that UNICODE is great for display strings, but dangerous for identifier strings. We wholeheartedly support the ability to capture people’s name in their local representation. Indeed, the author’s full name includes an “ø”. But we are saddened by the careless adoption of UNICODE in international domain names, which has unnecessarily exposed an internationalized domain name homoglyph attack surface. For example an ASCII o is very difficult to tell apart from a cyrillic о (not the same character!), allowing for microsoft.com and mіcrоsoft.com (not the same domain!) to coexist.

We do not fancy ourselves that we can fix the internet. But we vigilantly seek to minimize exposures anytime we have the chance; in this case by adopting ASCII-based URN identifiers where we have the chance.

Key Origins

A natural key is derived from business or application data which are unique in their nature and are typically intuitively understood by humans. By contrast, a synthetic or surrogate key is technically generated solely for the purpose of uniqueness, without direct linkage to the real world.

We saw how LinkedIn in some cases uses the person or organization name as natural keys. This is possible for finite resources which carry such natural identifiers. In such cases it is preferable to adopt human recognizable and intuitive keys. Where possible we have similarly adopted natural keys where resources are finite and under Transmute’s control. Prevalently, though, v4 UUID synthetic keys are used, in particular for all customer controlled resources.

Transmute Identifier Design

This section attempts to formalize Transmute’s identifier design, pertaining to our Verifiable Data Platform (VDP).

Resource Identifiers

VDP resources are identified with URNs.

The general pattern for VDP URN identifiers are as follows:

urn:transmute:[resource-type]:[v4-uuid|natural-key]

Where:

  • urn, the URN scheme,
  • transmute, the unregistered URN NID,
  • [resource-type]:, first part of the URN NSS, indicating the type of resource in singular form.
  • [v4-uuid|natural-key], second sub-delimited (unique) part of the URN NSS; a v4 UUID for user governed resources, kebab-cased natural key for Transmute governed resources.

API Resources

Resources are accessed on the VDP REST (REpresentational State Transfer) API following this general pattern:

https://platform.transmute.industries/[resource-type]/[resource-identifier]

Where:

  • https, the HTTPS scheme,
  • platform.transmute.industries, the platform host
  • [resource-type], first part of the path, indicating the resource type in plural according to REST naming best practices.
  • [resource-identifier], identifying part of the path, URN as described above.

Versioning

Note that we are not including API versioning on the path. Of course, we may have to introduce breaking changes on the API at some point (it has not yet been necessary). When this day arrives, we will adopt a header-carried design for version management.

Combining REST and Well-Formed Identifiers

The combination of self-explaining URNs and collections style REST endpoint URLs entails a degree of redundancy, for example the inclusion of :credential and /credentials in the same request.

LinkedIn faces the same situation on their LinkedIn REST API, where for example “activity” is repeated on the path and in the identifier:

GET ‘https://api.linkedin.com/rest/dmaActivities?ids=List(urn%3Ali%3Aactivity%3A7074127827113058304)'

Like LinkedIn, Transmute finds this is an acceptable consequence considering the combined benefits gained from strong identifiers and REST.

Content Negotiation

Three representations of the same Verifiable Credential resource, respectively: text/html, application/json, and application/pdf.

Following cooluri best practices, different representations of the same resource are available, subject to content negotiation. The best example of this is Verifiable Credentials, which are available both as HTML, JSON and PDF.

Verifiable Data Platform Examples

This section lists identifier examples of resources currently supported on VPD.

  • urn:transmute:credential:b5d5f8d6–3a83–4n1s-a4d5-a56cd09a5ac6
  • urn:transmute:presentation:bdb8a296-ed17–41a6-b5e6–46n157747bee
  • urn:transmute:identifier:09en1584–9366–44af-8a0b-1aaf95d83607
  • urn:transmute:contact:ef54af47–4303–4874-bee6-fe54c279b851
  • urn:transmute:template:commercial-invoice-credential
  • urn:transmute:workflow:definition:us-cbp-entry
  • urn:transmute:workflow:instance:f5fb6ce4-b0b1–41b8–89b0–331ni58b7ee0
  • urn:transmute:adapter:neo4j

Note the extra hierarchical layer of workflow:definition and workflow:instance NSS sub-delimiters, reflecting the many-to-many relationship between the workflow definitions and instances.

Example of a workflow referencing a Workflow Instance and a Workflow Definition.

These are examples of credential and template resources on the platform:

Here, the template identifier is passed in the queries string for issuing a Commercial Invoice credential from the template-based form:

Some additional example of resource querying:

Fragments are used to reference verificationMethods of a DID Document as per the DID specification:

  • did:web:platform.transmute.industries:organizations:org_WioDDk68cALd3YoN#z6MksU7WFxyMS3jupvdTUnBn15vfjvh5taZbwixdfrqs1vJj

Wrapping Verifiable Credential

As all Linked Data classes, Verifiable Credentials and Verifiable Presentations have @id properties, and at first glance it may seem like a natural choice to identify the VDP resource. However, this identifier is entirely in our customer’s control, and can like any other property during issuance; or it could be issued on a different platform and only stored on VDP. Such untrusted user input is ill fitted as database primary keys, as we cannot guarantee uniqueness.

Moreover, the @id property is optional per the VC Data Model specification, which further underscores the point: a primary key cannot be optional.

Adopting the VC and VP’s @id as platform resource would have led to very unintuitive and intrusive platform behavior: rejecting storage requests, injecting identifiers into customers’ data objects, etc — clearly unacceptable from a business platform dealing with trade data. We believe this is a generic challenge faced by all but the most puristic Linked Data platforms, in which users expect, understand, and appreciate such behaviors and restrictions.

So, instead VDP maintains these classes in abstracted resources which wrap around the basic VC and VPs, separating the VC/VP itself from the platform’s representation of it. This has several benefits:

  1. There can be perfectly acceptable business reasons to have non-unique or anonymous nodes in a data graph, and this lets customers be in complete control of their verifiable data graph.
  2. The wrapper can contain valuable metadata about the resource, for example the sender and receiver of a verifiable presentation.
  3. The abstraction allows for storing multiple representations of the same credential, kept as different formats, controlled as media types.
Example of a presentation containing two credentials with colliding ids, urn:ex:x, stored comfortably in VDP within resources urn:transmute:credential:1 and urn:transmute:credential:2 (URNs simplified for clarity).

Status List Credentials

Bitstring revocation credentials are an exception to this principle. Per the spec, revocation credentials must be resolvable, so their identifier (referenced by the revocable credential) must be a URL, not a URN. For example

‘id’: ‘https://platform.transmute.industries/credentials/97abfc14-5n1s-45ff-bfde-de30eb772c20’

As a consequence, revocation credentials are bound to VDP in that they cannot be moved neither to nor from other platforms. This is both acceptable and a logical consequence of the purpose they fulfill.

Execution Challenges

This section is based on interviewing our engineering team, reflecting upon the process of designing and implementing the above identifier design. The truth is developing this part of our platform has led to challenges and frustrations amongst the team. It is our hope that others can learn from our experience and plan for a smoother process towards implementing a great identifier design.

Leadership

Management has struggled to share the vision for Linked Data and combine it with pragmatic execution. Designing Linked Data relationships permeates the stack, and often got in the way of fast execution. When not communicated properly, there were tendencies towards disliking Linked Data before it was even properly understood.

This experience was mirrored in the engineering team as a sense of being too involved in the design process: “the vision gets muddy with too many people involved”.

Indeed, authoring this document is meant as an attempt to once and for all get team alignment on the company’s identifier strategy.

Value

Consistency in the API design has been a reoccurring challenge. When the vision and principles are not communicated well enough to the team, the normal self-correction mechanisms do not kick in, and so inconsistencies slip through. The deviations add up and become self-perpetuating, leading to bugs, miscommunication, and frustration.

When consistency is essential, it means the normal MVP feature approach fits poorly with identifiers. There is clearly an investment to be taken seriously. You should not do a half baked job of this; you are better off not starting at all.

The engineering team have had periods where this tweaking identifier semantics sugar has felt like “work, not value” which could be forever improved, but with diminishing impact.

Requirements

Something as fundamental as identifiers takes strong requirements up front. Sporadic bug tickets, for example that a collection should be plural, becomes a debate of the framework which it is held against, whether it is a bug at all.

In interviewing, one engineer requested a README document to avoid inconsistencies — essentially requesting this document. Once we had gone through it he immediately explained an API endpoint which he believed was inconsistent. The team self-correction worked! It is clear that a document like this one should have been produced much sooner in the process.

Commercialization

There is a duality in the team’s feelings about where we have gotten. We are proud of our identifiers! But we also recognize that they are long and may seem over-engineered and unapproachable, not least in our target supply chain sector market where it is common practice quoting for example “manufacturer numbers”. As much as we love the technical details, we realize they may be a hard selling point.

However, we insist that identifiers are essential to building systems with safety, privacy and value in focus, especially in today’s increasingly connected world. It has always been true that identifier collisions can be costly. The context and requirements for identifiers have changed dramatically since the analogue days. We take it upon ourselves to design for a safe and efficient future, even if — especially if — our customers do not fully appreciate everything which this takes.

Database Identifier Decoration

Rather than storing the full URNs at database level, we still only store the raw keys, adding the URN decoration around it. This means removal before it goes into the DB, and add it back on the outbound. There is a lot there you have to remember when making changes. And you have to think about future scenarios too, for example if Transmute got bought, that might affect the transmute namespace, and you would be adding more layers. As one engineer puts it “adding context adds complexity”.

Not storing the URNs is effectively a case of not making full use of our own investment. Partly due to history, partly due to the mid-process confusion described above. For example, there have been worries that the special code would have to deal with lower environments identifiers (subdomains used only internally), a result of conflating URLs with URNs. The database level is essentially the last, missing step of the process described in this section, and now that the principles are fully understood it will be a matter of time before we get there.

Conclusion

Transmute sells identifiers, and we aim to sell amazing products. So our identifiers need to be amazing.

We have studied examples of major internet players, HBO and LinkedIn, how they both construct URN identifiers, which are then used in URLs. We also studied the GS1 Digital Link, which has good reason to take a different approach, prioritizing backwards alignment.

We analyzed the drivers behind these design principles: URI benefits, separation of concerns, security, and key origins. Based on these, we were able to formalize VDP’s identification design.

Finally, we shared some honest reflections on the real-life transformation our team has been through. It is our hope that others who share our ambition will benefit from these learnings.

Consistency is an essential measure of quality. This document should make clear not just what quality identifiers look like, but also motivate why.

We encourage comments and discussion on this article, as well as signing up for free on https://platform.transmute.industries.

--

--