Transmute Tech School: Boost Supply Chain Enterprise by Leveraging Linked Data

How Adding Context Turns Siloed API Messages into Globally Connected Data

Nis Jespersen
Transmute
8 min readNov 9, 2022

--

In our previous Transmute Tech School article, I described the basics of Linked Data, the integration benefits of which are driving the accelerating adoption of Linked Data for modern, data-driven business. My aim for this post is to dig deeper and sketch how actors across supply chains stand to gain tremendously from wider adoption:

  • Linked Data is the perfect tool for capturing the inherently networked nature of supply chains.
  • Web-based semantics function as a machine-based lingua franca, connecting data models originating from diverse cultures, languages and jurisdictions.

I will go into a little bit of technical detail, but promise to keep it light weight and not lose sight of the business benefits which the technology enables.

From Messages to Snippets Connected Data

It is generally understood that APIs are the enabler of digitization and automation. The “API Economy” is upon us, and there are competing platforms for all aspects of logistics and commerce processes. Linked Data builds upon this established infrastructure.

An API is — roughly speaking — a list of operations called by a client to invoke services and manipulate resources on a server. Each operation can have a request object and a reply object which are essentially digital documents with data relevant to the given operation. Data is normally kept in systems and databases and is mapped to and from these API documents, most often represented in JSON format.

The same data is often used in conjunction with multiple APIs, from a multitude of providers. On one platform, my product might be mapped to offeredProduct and assigned an identifier abc, whereas on another platform it maps to vendor-product as xyz. It is cumbersome for me to map data separately to every platform individually. And you would not be able to tell that these documents in fact deal with the same product.

Context is required to understand that these documents describe the same product.

Adding Context to Message Data

Someone receiving these two documents will require a significant amount of context and human interventions to determine a) that these are the same type of products and b) that this is in fact the same product. This is essentially the problem which Linked Data solves by providing the necessary context.

By use of the standard JSON Linked Data, web identifiers can be assigned to every type and object. This way, instead of defining and identifying data locally (within the document), they can be defined and identified globally (on the internet).

Continuing from the earlier example, here is how JSON-LD could add web identifiers “on top” of the API documents:

  • offeredProduct and vendor-productare both of the type https://schema.org/Product.
  • Although assigned abc on one platform and xyz on another, the product’s unique web identifier is https://example.com/prod/123 and are in fact the same product.
Linked Data adds a layer of context, pointing data elements to unique web identifiers.

Linked Data deals with data rather than a particular document representation of the data. JSON-LD was invented to combine these two aspects.

Non-Breaking

The question you should now be asking is: “does this mean we need to rebuild our APIs?” The answer is a solid: No! JSON-LD works with a small set of keywords such as @id and @type, shown in the above example. In JSON, unknown elements are simply ignored, so these can be added to the exchanged documents and freely used or ignored by API users. Adding JSON-LD is a “non-breaking” change.

Semantic Vocabularies

We have established now how it is technically possible to make data much more expressive by applying URIs to API messages. The next topic to address is which URIs? The technical linkability isn’t worth much if I use https://my-product-definition.com and you use https://your-product-definition.com.

This is where semantic web vocabularies come into play. Earlier I referenced https://schema.org/Product, and https://schema.org is exactly such a vocabulary, providing many of the most used terms on the internet. Proper vocabularies should:

  • Be static and not change established URIs because data definitions depend on them.
  • Have URIs resolve to human readable explanation of the defined term.

Other vocabularies are targeting more focus domains. Noteworthy vocabularies specialized for supply chains include:

Agreeing on terms used is an essential part of any standard, akin to agreeing on which language to speak. The fact that the semantic layer can be added without breaking APIs makes this conversation much easier. Agreement is easier reached when neither party risks bothering their customers.

Business Case

Let us now move to some of the business cases considerations which I expect that you would now be rightfully contemplating. Under which circumstances might Linked Data make sense for industry adoption?

Who Pays, Who Benefits?

I will first discuss how better defined data logically switches the effort of interpretation from the data consumer to the data provider.

As a data provider, Linked Data lets me:

  • Communicate unambiguously, expressing my data with explicit, uniquely identified semantic terms.
  • Pre-map data to globally understood terms in a serialization format with strong tool support.

As a data consumer, Linked Data lets me:

  • Receive data in computer readable form, which can be directly used and stored.
  • Understand the data directly, without human interpretation.

The burden of “understanding” is shifted from the data receiver to the data sender. The clarity of the sender obviously benefits the receiver.

For an API with only a single sender and a single receiver, this clarity can arguably be established more efficiently by traditional system integration where data models are manually mapped to each other. However, such one-to-one APIs are rare, and there isn’t much “API economy” about it. Platform and pipeline business cases are typically based on scaling business partners and/or customers. So with this in mind, let’s consider some cases where the investment in concise and expressive data pays off.

Many Data Providers

Let’s imagine there are 1000 providers of data to a platform. Establishing 1000 data mapping projects would be insanely expensive in consulting costs. Not to mention maintaining them. Here, requesting the data providers to be expressive and computer readable is practically a must.

Case

This is literally how Google search works. Search indexes are built not from interpretation of page content, but from terms expressed explicitly by the page owner, defining their own content. A Google recipe is indexed and rendered from linked data terms such as https://schema.org/recipeCuisine and https://schema.org/cookTime.

How to explicitly tell Google that this page is about an Apple Pie recipe. (https://developers.google.com/search/docs/guides/introstructured-data)

Comparative situations exist throughout supply chains, logistics and commerce: customs declarations to a single window, credit checks to a financial institution, shipping instructions to a carrier, purchase orders to a manufacturer or ecommerce platform. These are all examples of large groups of participants expressing data, passing it to platforms. As long as they point at, say, https://schema.org/Order, their differing data representation does not matter.

Many Data Receivers

Now let’s consider the reverse situation, 1000 receivers of data from a platform. Everyone has to learn and understand what the platform’s data means.

Case

EDI, which is so prevalent in supply chains, is the perfect anti-pattern. An EDI endpoint comes with an “implementation guide”, which is a document which a human is expected to read to make sense and map the codes of the EDI messages. Manual integration efforts are assumed, perceived as the solution.

Sample page from an EDI Implementation Guide. No human should suffer such pain. (https://www.govinfo.gov/content/pkg/GOVPUB-C13-0d00b8dcbbecadc6e1609edf97e9a0b6/pdf/GOVPUB-C13-0d00b8dcbbecadc6e1609edf97e9a0b6.pdf)

A monopoly might get away with this kind of “read the spec” attitude, but most often you want to make integration as seamless as possible. Granted, modern API specs (“swaggers”) are friendlier than EDI implementation guides. But it is still based on the premise of “here is how you should interpret my language”. By overlaying the platform’s API with Linked Data, any of your customer or business partners with LD capabilities can plug the platform data stream directly into their operational environment.

Industry specific examples of this include manufacturer or ecommerce product listings, carriers’ bills of lading, and trade certifications from inspectors, chambers and other organizations.

Knowledge Graphs

Sprinkling some URIs over JSON documents may seem rather simple — and it is. Yet, it is all that is needed to reveal that data on one API is in fact the same as on another API, despite differing nomenclature.

This opens up a whole new world of data analytics. JSON-LD implements the Resource Description Framework, which is a first order logic data representation, opening up the world of Data Science. Graphs of data, so-called Knowledge Graphs, can be built from the linked data elements. These can be stored in modern graph databases and directly subjected to massively powerful graph algorithms, revealing connections, outliers, patterns, inconsistencies hidden as implicit connections within the data, turning them into explicit, actionable ones.

The downstream Machine Learning which Linked Data enables deserves dedicated attention, and I promise to follow up soon with a blog post specifically on this topic.

Conclusion

Linked Data is in much broader use than most people realize, but its value has not yet bubbled up to industries which are still transitioning to API-based digitization. JSON-LD extends this, with the promise of greatly simplifying API integrations, while paving the way to data graph analytics. In all but the most simple cases, there is good business justification for adopting JSON-LD to lower your or your customers’ integration complexities. Supply chains in particular stand to benefit from this, being connected networks and spanning a heterogeneous, international set of languages, jurisdictions and models.

Transmute holds editor roles on essential open source Linked Data projects including:

About Transmute

The author is editor of the United Nations CEFACT Vocabulary, which expresses the prevalent supply chain semantic model on the as referenceable web URIs. Transmute is also W3C Traceability Vocabulary, which models supply chain schemas from relevant vocabularies, specifically for issuance as Verifiable Credentials. Transmute authored the Decentralized Identifiers specification, and is currently editor of the Verifiable Credentials v2.0 specification.

We use Linked Data to produce Verifiable Credentials, establishing cryptographically verifiable data graphs among trusted entities.

Nis Jespersen, Transmute’s Solutions Architect, is an editor of the United Nations CEFACT Vocabulary, which expresses the prevalent supply chain semantic model as referenceable web URIs.

Connect with Nis on LinkedIn and Twitter

About Transmute: Building on the security and freedom that Web3 promised, Transmute provides all the benefits of decentralization to enterprise teams seeking a cost effective, interoperable, planet-forward experience provided by experts in technology and industry.

Transmute was founded in 2017, graduated from TechStars Austin in 2018, and is based in sunny Austin, Texas. Learn more about us at: http://www.transmute.industries

Connect with Transmute on LinkedIn and Twitter

--

--