See the Global Supply Chain with Knowledge Graphs and UN Web Semantics

Published in

Transmute

9 min readJan 22, 2023

This article was based on Transmute Solutions Architect Nis Jespersen’s ‘UN/CEFACT Linked Data’ presentation from December 2022.

Leading the UN Web Vocabulary project, I presented at the December 2022 UN/CEFACT Forum. The UN Forum sessions are not recorded, so I thought I would just capture the main points in this article.

The essential part of my presentation was a live demo, building a supply chain knowledge graph from the bottom up. In doing so, I gradually introduced the full tech stack in play:

The essentials of APIs, JSON and JSON Schema
Adding semantics with JSON Linked Data
Building Knowledge Graphs from LD files
The important role of standardized Web Vocabularies

I typically also talk about Verifiable Credentials and Decentralized Identifiers. But not this time — this is all about UN semantics and data graphs.

Introducing The UN/CEFACT Web Vocabulary

The UN/CEFACT Buy-Ship-Pay model is the undisputed semantic model for terms in global trade. It has been around for decades, and is broadly recognized, adopted and implemented.

UN/CEFACT Web Vocabulary brings this model to the web, expressing the existing trade terms as a library of so-called Unique Resource Identifiers — URIs.

The above example shows the definition of the term Trade Party, behind the URI https://vocabulary.uncefact.org/TradeParty. The URI itself is great for unambiguously express intent; resolving it to the above documentation page makes it human understandable too.

We will come back to the UN/CEFACT Web Vocabulary shortly and why it matters. But for proper context, we will start a couple of layer down the tech stack, level-setting on some API essentials.

APIs, JSON and JSON Schema

Since you are reading this, there is a pretty good chance that you have heard of APIs before.

An API is a way for computer systems to communicate over HTTP. It is the same protocol serving you this article right now, an API is just tailored to be invoked by computers instead of humans. This is done by stripping away all the graphical elements and putting more emphasis on the structure of data.

JSON Schema, Defining Data

APIs are made up of “endpoints”, each of which can define a data structure for what you sent to it (the request) , and another data structure for what you can expect to get back (the response). A definition of such a data structure is called a schema. Typically, data gets encoded in the JSON syntax, so you can think of an API endpoint consisting of a Request JSON Schema and a Response JSON Schema.

The Request and Response JSON Schemas of a sample API.

Each JSON Schema defines for example hierarchical structures and naming of attributes. Note that a JSON Schema does not contain data — only how data must be structured.

JSON, the Actual Data

JSON files carry actual data. JSON is just a file format, much like .doc or .txt. A JSON file can be validated against a JSON Schema — this is how an API controls what data gets exchanged.

While JSON is designed for data transfers, it is also quite readable by humans:

{
  "id": "https://sales.online-shop.global/inv/0000112318",
  "type": "TradeTransaction",
  "applicableTradeSettlement": {
    "type": "HeaderTradeSettlement",
    "invoiceDocument": {
      "id": "urn:uuid:9c3c66a6-f49f-485e-a7b2-013fb4a0e0a8",
      "type": "Document"
    },
    "invoicerParty": {
      "id": "did:web:online-shop.global",
      "type": "TradeParty",
      "postalAddress": {
        "type": "TradeAddress",
        "attentionOf": "Nis",
        "streetName": "Runebergs Alle",
        "cityName": "Copenhagen",
        "countryName": "Denmark"
      }
    },
    "invoiceeParty": {
      "type": "TradeParty",
      "postalAddress": {
        "type": "TradeAddress",
        "streetName": "Sunshine Ave",
        "cityName": "Austin",
        "countrySubDivisionName": "Texas",
        "countryName": "USA"
      }
    }
  }
}

Unless you have a severe case of code-allergy, you should intuitively get the gist of this JSON file: “Trade Transaction”, “Invoicer Party” and “Invoicee Party” — this is a something about an invoice.

If you know UN/CEFACT you might even recognize the exact terms used. Trade Transaction, for example, means something very specific to those who “speak the UN/CEFACT language”. Such a common language helps the developer on the receiving side interpret the data according to the sender’s intention.

While such a “common language” is much better than nothing, the human interpretation aspect entails certain challenges:

Costly, as developers must make data mappings on “both sides” of the API.
Error prone, because all human involvement is, even assuming alignment to a common specification is established.
Does not scale, if you have 10 customers using your API, they must each have a developer team doing data mapping 10 times over.
API breaking, a live API cannot just be changed to align to UN/CEFACT will without breaking the API contract (JSON Schemas), greatly displeasing your customers.

While many organization accept these shortfalls of working with raw JSON, there is a much smarter way, namely…

JSON Linked Data

What traditional integration developers do is add the context needed for the target computer to work with the data. JSON-LD allows the sender to add this context. Literally, using a keyword called @context. The context maps the “human friendly” terms used in the JSON to “machine friendly” URIs.

In non-technical terms: when the sender is more explicit, it is less ambiguous for the receiver to understand the message.

Adding context switches from “encode once, interpret anywhere” to “interpret once, understand everywhere” which is great for scalability economics.

Even better: adding a line with the@context definition into your JSON doesn’t even break your existing APIs! If the receiver does not have JSON-LD support, the JSON library will just ignore this attribute.

As an example, the invoiceDocument attribute in the earlier example really isn’t anything but a string. But the @context maps this to a computer-friendly URI such as https://vocabulary.uncefact.org/invoiceDocument.

The example below was the first live demo of my presentation. It shows how adding just the@context lets the Linked Data processor automatically structure the data.

Adding an @`context` statement to the JSON (left) makes the data processable by a computer (right).

To recap: with very little effort, we can add precise semantics to our data. We make the data meaningful. Our next step will be to turn that meaning into knowledge.

Knowledge Graphs

The JSON-LD processing we just saw above actually picks apart the JSON, turning it into individual basic statements. Each statement is called a triple, because the consists of three things: subject, predicate and object.

For example: “The consignment’s (subject) consignor (predicate) is a business called Global Online Shop (object)”.

Another triple could be “Global Online Shop’s location is the UN/LOCODE DKCPH”.

We can piece together these two statements: Consignment — Global Online Shop — DKCPH. This way we can infer that the consignment is going to Copenhagen. The JSON-LD file is actually a data graph, which we are traversing for insights.

How a sample Bill of Lading data snippet turns into a data graph with JSON-LD.

The above diagram illustrates the data graph aspects of a Bill of Lading document which could be transferred through any standard API. But because JSON-LD is based on URIs (and not vague strings like “a business called Global Online Shop”), the graph does not have to be limited to just one JSON file. Say we have another API which deals with invoices, and we route inbound messages to a graph database. A graph database can recognize common URIs and easily deal with overlapping graph segments. This way, we can continuously piece together larger and deeper data graphs.

Data graphs from two separate JSON-LD files “snap together”.

The above example is also derived from my live demo. By importing a Commercial Invoice “on top of” the previously imported Bill of Lading, we realize that:

The consignee of one document is the same organization as the invoicer party of the other.
We expanded the common knowledge of this organization, now knowing both its postal address and UNLOCODE.

We did this with literally no manual data mapping. The knowledge graph just “snaps”into place like magnets.

A knowledge graph constructed from very large amounts of JSON-LD files. Pretty-looking knowledge graph on the right curtesy of https://medium.com/@annalienk/investigation-of-the-flow-of-tweets-d0b1c31d915b.

This means we can dump massive amounts of JSON-LD files at the graph database. Data can come from different origins, APIs, data schemas, etc — it will all still snap together automatically. A key feature of graph databases is their ability to reveal hidden relationships across siloed data.

Extracting Knowledge

Not having to worry about the hassle of fitting data together, we can focus our efforts on data analysis. We can do this with standard data queries. For example “return all consignments to be delivered in Denmark”.

In my final demo at the UN/CEFACT Forum I it took it a step further, though, introducing some basic data science tooling. This means applying graph algorithms and machine learning on the data graph’s explicit relationships, in the search for its implicit relationships. I used Neo4j’s Graph Data Science which offers rich library of such features.

Specifically, I ran the Betweenness Centrality graph algorithm on the knowledge graph we created earlier, build from a Bill of Lading and a Commercial Invoice. The result is illustrated graphically below:

Betweenness Centrality algorithm applied on the previous example.

This result reveals how often shortest paths between nodes pass through a given each node. Unsurprisingly, the “Global Online Shop” node scores high — remember, this was the node which connected the two subgraphs.

Data scientists typically connect multiple graph algorithms in their search for patterns. Here are links to a couple of examples which I have shared in earlier articles:

Determining untrusted subgraphs of the UN Trust Graph.

Determination of Verifiable Credential data originating from untrusted subgraphs, based on the UN Trust Graph concept:
https://medium.com/transmute-techtalk/the-united-nations-trust-graph-d65af7b0b678

Trade Party Community Detection, determined from running a series of graph algorithms on basic verifiable credential issuance patterns:
https://medium.com/transmute-techtalk/neo4j-graph-data-science-with-verifiable-credential-data-98b806f2ad78

Semantics is Everything

We have now gone through the whole tech stack and at the same time progressed from Data to Meaning to Knowledge.

Without semantic context, raw data is meaningless. Traditional APIs depend on human intuition and labor to make sense of data. But we have seen how simple it is instead to add an explicit, declarative context and let computers do all the hard work.

When meaning is automated, we can be much more flexible with our data sources yet still shift our attention to gaining knowledge. With all those previously disparate datasets now connected, by leveraging modern algorithms we can extend our knowledge into the implicit relationships, answering questions we would have never thought to ask.

Web vocabularies provide the common definition of meaning, and strong web vocabularies are vital to this new infrastructure. The best web vocabularies are governed by authoritative institutions, which are relevant for the domain they define. This way, they gain gravitational critical mass and become not only formal, but also de-facto standards.

Lending from UN/CEFACT’s decades-long established authority, the UN/CEFACT Web Vocabulary is the undisputed global semantic dictionary for terms in trade.

https://vocabulary.uncefact.org/

Nis Jespersen, Transmute’s Solutions Architect, is editor of the United Nations CEFACT JSON-LD Web Vocabulary project.

Connect with Nis on LinkedIn, Twitter, & GitHub

About Transmute: Building on the security and freedom that Web3 promised, Transmute provides all the benefits of decentralization to enterprise teams seeking a cost effective, interoperable, planet-forward experience provided by experts in technology and industry.

Transmute was founded in 2017, graduated from TechStars Austin in 2018, and is based in sunny Austin, Texas. Learn more about us at: http://www.transmute.industries

Connect with Transmute on LinkedIn and Twitter