On JSON-LD and the Semantics of Identity

Orie Steele
Transmute
Published in
7 min readJan 6, 2020

In this post, we’ll explore how JSON-LD is used in a number of contexts including decentralized identity and verifiable credentials. We’ll also cover the basics of what you should know before using JSON-LD and how you can contribute to software and standards that rely on it.

Photo by JJ Ying on Unsplash

Throughout this post we’ll reference work that is currently in progress, some of which is funded by the Department of Homeland Security’s (DHS) Silicon Valley Innovation Program (SVIP). Read more about our work with DHS and Customs and Border Protection (CBP) here:

What is JSON-LD?

JSON-LD is a lightweight Linked Data format. It is easy for humans to read and write. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale [0]. You can read more foundational information about JSON-LD in the W3C:

https://www.w3.org/TR/json-ld11/

What is the value of Linked Data?

In computing, linked data (often capitalized as Linked Data) is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database [1].

How does JSON-LD help the internet to become a global database?

A real-world example is helpful here.

Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond [2].

Google uses Schema.org and JSON-LD to power its Knowledge Graph API [3], which in turn helps developers build search interfaces and ensure that industry data is cataloged properly and accessible to search engines.

Part of what makes this possible is getting developers to agree on how they are going to express their structured data.

For example, this structured data uses Schema.org to express the concept of a person selling a car:

https://schema.org/Car#Car-gen-300

What happens if everyone chooses to represent the concept of “Person” and “Car” differently? The search engine cannot tell that a “Car” on one website is the same type as a “Car” on another. By leveraging a shared context (schema.org), websites can express their structured data in a way that allows for interoperability.

When structured data is expressed as JSON-LD or RDF, it can easily be integrated with semantic reasoning systems.

A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms to work with. The inference rules are commonly specified by means of an ontology language, and often a description logic language [4].

At this point, it’s better that I redirect you to these excellent posts on graph representations of knowledge:

Welcome back, I’m sure you read every word of those posts so we can pick up the pace.

We’ve established that semantic descriptions of structured data are the key to making data useful to machines, which in turn make data useful to humans. If you think search engines are useful, you agree, and you are glad that schema.org and Microsoft and Google adopted open standards like JSON-LD that modernize the most common expressions of knowledge. If you are a developer, your reward is this awesome github repo:

Why do Verifiable Credentials use JSON-LD?

“This example demonstrates extending the Verifiable Credentials Data Model in a permissionless and decentralized way. The mechanism shown also ensures that verifiable credentials created in this way provide a mechanism to prevent namespace conflicts and semantic ambiguity [5].”

Why would semantic ambiguity be dangerous?… Is Mercedes the person the same as the car? How can a search engine tell? What about the DMV? Semantic ambiguity introduces unnecessary opportunities for error that may break business processes and confidence in system adoption.

Alternatives to JSON-LD

There are of course ways to express these differences other than JSON-LD or graph technologies…. You can use JSON Schema! But before we discuss JSON Schema, is it possible to represent semantic information in an open standard with web scale adoption with a technology other than RDF / JSON-LD?

I think the answer is no… so let’s unpack JSON Schema as an option.

What is JSON Schema?

JSON Schema is a vocabulary that allows you to annotate and validate JSON documents [6].

This means that JSON Schema is limited to providing these features for documents that are JSON, unlike RDF/ JSON-LD which can be used to annotate XML or JSON. Indeed, the lack of “Linked Data” support here means that anyone attempting to recreate Linked Data features won’t have a W3C standard to guide them.

There are a number of ways one could decide to relate JSON Schema documents to each other.Maybe someday in the future, all semantic web annotations will be based on IETF JSON Schema variants. Backwards compatibility with existing XML based systems might not be necessary for those who don’t wish to build of the semantic graph concepts we covered earlier in this post.

There is one place where JSON Schema has gathered significant traction:

The OpenAPI Specification (OAS) defines a standard, language-agnostic interface to RESTful APIs which allows both humans and computers to discover and understand the capabilities of the service without access to source code, documentation, or through network traffic inspection. When properly defined, a consumer can understand and interact with the remote service with a minimal amount of implementation logic [7].

Using JSON-LD and JSON Schema with Verifiable Credentials

The following work is under development and subject to changes:

https://w3c-ccg.github.io/vc-json-schemas/

It’s possible to define both an @context and a schema for a verifiable credential. This spec describes how to leverage the schema property since the verifiable credential data model already describes the use of JSON-LD. Using this system, developers who want to use JSON Schema to validate user input can do so, while developers who want to express semantic concepts and integrated with linked data sources external to the credential can leverage JSON-LD.

How is the DID WG using JSON-LD?

At the time of writing this, there has been a lot of github traffic about JSON-LD and its use within the DID Core spec:

https://github.com/w3c/did-core/issues/128

At the time of this writing, @context is a required property, so every DID Document that follows the DID Core spec is automatically expressing some semantic information about cryptographic keys, services, or proofs related to a DID subject.

Many developers do not require the RDF interoperability, semantic web technology, or graph data modeling features of JSON-LD. Instead they wish to use the DID Core spec as a standard way for expressing cryptographic material, and services associated with a DID subject — without any semantics.

You can copy DIDs from the universal resolver into the JSON-LD playground and see if there are any errors processing them…

What does it mean if there is a JSON-LD processing error of a DID Document — does that mean that the DID Document is not spec compliant?

  • It’s true that the document cannot be processed for semantic ambiguity.
  • It’s true that the document cannot be signed using Linked Data Signatures.
  • It’s true that the document does not have a triple representation.
  • It’s true that the document is not valid JSON-LD (under strict interpretation).

Who is at fault when such an error arises?

The answer is the DID Method implementer, or the DID Controller… whoever caused the representation of the DID Document to throw errors when processed as JSON-LD in strict mode.

Does removing the @context solve this issue?

Yes, it makes it clear that the DID Method does not support semantic web technology, does not provide a mechanism for managing structured linked data that is compatible with Schema.org or Google Knowledge Graph or semantic inference engines, or XML. If these types of interoperability are important to your business, you should pick a DID Method that ensures that DID Documents are valid JSON-LD.

Should everyone be forced to use JSON-LD in order to make a DID Method?

This is the tough question… For now, if you want to leverage semantic web technologies or reasoners, you can just follow the DID Core spec and make sure that your DID Documents don’t actually throw errors.

If you are building a system that needs to integrate with XML, Knowledge Graphs, Medical, Supply Chain or other industry ontologies you should probably make sure the DID Method you choose uses JSON-LD and does not explode when processed in strict mode :)

Finally, while there is no single perfect method, building an interoperable ecosystem requires standards-compliant applications. JSON-LD and optional use of JSON Schema are the current leading standards candidates, and in our view the precision and interoperability they offer often outweighs the technical challenge of implementation.

Additional Resources

If you are interested in using GitHub to develop a DID Method or Verifiable Credential that uses JSON-LD you may find this website helpful:

https://context.transmute.org/

Sources:

[0] — https://json-ld.org/
[1] — https://en.wikipedia.org/wiki/Linked_data
[2] — http://schema.org/
[3] — https://developers.google.com/knowledge-graph
[4] — https://en.wikipedia.org/wiki/Semantic_reasoner
[5] — https://www.w3.org/TR/vc-data-model/
[6] — https://json-schema.org/
[7] — https://swagger.io/specification/

--

--