Semantic Web History & Tools to use, to “Naturalise the Web”…

Timothy Holborn
WebCivics
Published in
12 min readAug 3, 2019

In my adventures,

working to find & build tools to build a human centric web,

I found (and subsequently got involved with the creation and growth of) a set of tools that when used in a particular set of ways, provide the means to deploy ecosystems where the human agent (natural persons) are able to be brought into the ecosystems, in a very different way.

The means to support this does not simply depend upon technology as a sole means to deliver meaningful solutions. Yet without the technical means to do it, nothing can be made to work.

In this article, i’ll try to provide some insights into how & why methodologies i’ve used have considered intellectual property; why the commonly held belief that ‘semantic web is dead’ is plain wrong, and provide some links to tools.

This article hopes to support the earlier ‘human centric web’ article with some technical pointers, as is otherwise mentioned below.

Intellectual Property & International (trade) Law

When building online products for commercial markets International law plays a significant role. Considerations include; but not limited to, Intellectual property and contract law, and KYCC/AML is also of great significance.

Whilst KYCC/AML is a component of operational requirements for commerce and identity related systems; design implications relating to IP & Contract law influence whether derivatives can be made transportable between platform operators. As such, when designing systems that are purposefully formed to support a natural persons ‘digital twin’, a desirable quality is ‘no lock-ins’.

The way this has been considered and acted upon by others is through the creation of ‘standards’ efforts that (for the most part) deliver royalty free solutions that can be used with the protection of corporations and institutions around the world. Whilst there’s a few ways in how this is achieved, the principle means is the creation of a ‘patent pool’, which acts to provide the right to use ‘registered intellectual property’ works of patent pool participants, as part of what becomes a standard.

This sort of practice is an important ingredient to how and why the W3C, IETF , OASIS, ITU, IEEE, ETSI, ISO and other standards bodies work as to serve the interests, of mankind.

Another important constituent of International Law is Contract Law, incorporating UN considerations for the international sale of goods and services; upon which for web-services is declared by way of ‘choice of law’ on Terms of Service Agreements.

International law evolved significantly the industrial era and still exhibits many problems with respect to how it has been employed for the cyber domain; particularly when considered in relation to cloud storage services.

Governments and large institutions address these problems by negotiation and the means to seek specialised assistances for their specific needs. Yet these sorts of options are not available for the vast majority of human beings.

Whilst technology, like other forms of infrastructure, is the means through which a set of principles can be defined in the real-world; the underlying set of principles that inform design of infrastructure, is moreover ‘rule of law’.

Whilst instruments like the US Constitution are reasonably celebrated internationally, the terms of that agreement are for US Citizens. Products and services that are governed by US Law cannot protect citizens in countries around the world by the same means through which it does for US Citizens.

One legal concepts relating to how it is this works is that of a Legal Alien. Yet it does get far more complex when bringing into these considerations the implications and influence brought about by national responsibilities of citizens in arguably any country, to first consider the needs of their home country when producing (or made aware of) any materially valuable and innovative opportunity that could relate, in future, to international trade.

Today, for the most part — websites and underlying web-services (Routers & network infrastructure, such as CDNs, Certificate infrastructure, etc) are provided to customers and consumers world-wide via ‘choice of law’ (somewhere) USA. As human ‘identity’ is in-effect considered a content issue, the means to harden via technical methodologies is now founded upon the choices made by USA as to support the means for personhood to be relevant to members of the human family who are not citizens of the USA.

The only known means to address these legal issues is by way of the United Nations, which has its Headquarters in the USA. Yet this is unlikely to occur for sometime yet as it appears the vast majority of our political representatives do not understand how these sorts of things work; and so it is impossible for them to consider how they may best seek to resolve any underlying issues, until a circumstance in which they are both rendered mandate by the electorate for whom they are bound by law to represent; and have the skills and capacity to do so, if such issue becomes important enough to learn.

Semantic Web (Ain’t dead)

In my journeys of advocacy towards the means to bring about a human centric web; I’ve spoken to many ‘out in the field’ technology experts, many in very large institutions and companies; and have heard on more than one occasion their testimonials that the semantic web is dead.

This sort of accolade has often been delivered by a CTO or similarly important provider of a role in large and important group entities, and quite aggressively so. As to suggest my time seeking to engage is a complete waste of theirs, with particular considerations about the way this effects non-technical stakeholders means to support any ‘sense making’ with respect to their roles, and the opportunity i’ve sought to put forward to them.

I’ve asked other world experts and now understand that i shouldn’t take it personally, and that they’re quite familiar themselves with this sort of thing.

Part of this displayed problem is technical and the other part social. I’ve addressed some of the most problematic social issues in my post about ‘Engineering Einstein’ which basically relates to the underlying issue that many of these actors behave as to be in control, and all too often mislead.

So one of the first and most important points that must be made, is that semantic web is far from dead. indeed anyone who suggests otherwise in any legal personality of importance should be immediately let go. The implicit problem is that the attitude displays both a set of facts that shows they do not understand what they’re talking about; and that, it’s going to take them quite some time to learn otherwise, as is required for them to fulfil the purpose of their role. If companies want to hire PR practitioners as technologists, so be it.

Yet the reality of Semantic Web is that it now powers the vast majority of our online services and is an instrumental ingredient resource used to power AI.

Semantic web is often repackaged as something else, and there have been similar yet alternative solutions that have since emerged. One such alternative is GraphQL that was originally made by Facebook and has since been made available on the open-source MIT license. Yet GraphQL is not at all similar to the semantic web tools also operated by Facebook and the related family of query language tools that are broadly known as SPARQL.

I think the philosophical difference is that tools made by those such as facebook have sought to get some of the benefits that are of most use to silo based online services, whilst engineering outcomes that focus on the needs of a silo; that is able to function world-wide using a globally distributed compute platform.

The SPARQL family of query tools provide the means to ask questions to accessible graph databases, and get a federated response. The means to achieve this is called SPARQL-FED.

SPARQL is extended to support the means to query for entities in media objects through a means that is called SPARQL-MM, which makes use of Media Fragments. SPARQL is also extended to support geo-spatial queries through the use of GEO-SPARQL as well as support for IoT via Web Of Things tooling.

To support the storage of computable resources, there is a body of standards work called LDP or Linked Data Platform.

Semantic web underwent a ‘rebranding’ process some years ago, perhaps responding to the belief that it was the name that was leading to people not making better use of these tools. So semantic web, was rebranded ‘linked data’ which moreover refers to a component of the semantic web ecosystem of tools, and has arguably been part of the cause for more confusion.

Linked Data Platform provides a set of standards to asset semantics to a container of resources. Yet this is not the only way to bring silo’d web resources into a format that can be made interoperable, with semantic web.

Whilst W3C has some resources on RDB2RDF, The easiest known way to explain how this can be done, is to refer the interested reader to the works of Virtuoso. In their medium post about how to How Virtuoso extends SQL with SPARQL — and vice versa they provide some basic insights which can be supplemented by a few of their videos[1][2]. Yet this ‘mapping’ process brings me to the next key point, which is ontologies.

Ontologies are machine-readable vocabularies that are made accessible on the web and form part of the most critical components that distinguishes semantic web tools, practices and services from all others.

When describing ontologies the image most often used is that of the linked-open-data cloud; which is somewhat MORE useful to those who are also provided some level of support to understand what it all means, whilst noting that it does indeed serve good purpose and is an instrumental means to understand the current proliferation of open semantic web informatics today.

https://lod-cloud.net/

Amongst the many useful tools is that of Prefix.cc which can help find vocabulary terms and schemas, currently available on the web.

The term that’s used to describe this underlying ‘engine room’ that is intrinsically tied to how many services work today, is called ‘the web of data’.

There are two easy ways to see how semantic web is today built into the vast majority of webpages made available on the Web. The first is this browser plugin made available by openlink called The OpenLink Structured Data Sniffer (OSDS) and the other is Google’s Structured Data Testing Tool, which is designed to help web-developers test there semantic web tools programmed into the websites that they’re making (without necessarily understanding that they’re using anything that has to do with, the semantic web).

The most prolific consumer-facing uses of semantic web today is to support the discovery and display of content in search engines like Google as well as social-media-silos, like Facebook. The vocabularies most used to perform this function include; schema.org, Open Graph Protocol and WikiData that is coupled to every wikipedia entry as a means to provide machine-readable services.

Yet this is not the only way that ontologies are now used, service infrastructure that includes (but is not limited to) electronic health records, are stored using semantic web tools, through tools like snomed-ct. The Global Education sector standards for communicating ‘achievements & qualifications’ also use semantic web tools via works such as the Open Badge Spec v2. Moreover, emerging international standards for credentials and related ‘identity’ infrastructure and components; use semantic web tools. Some examples of this include identity.foundations that is now an instrumental part of Microsoft’s Identity Services Strategy. Whilst such works started through the minds of only a few, in which i was involved; and whilst the leader of this new component of semantic web work is more easily linked via the IANA URI Scheme record for DIDs; yet in contrast to reality of today, all too many assertively denounce ‘the semantic web’ to be dead.

If more of those people got fired for their idiocy, the world as is experienced by the vast majority of others could quite likely be, a much better place.

SEMANTIC WEB AND ARTIFICIAL INTELLIGENCE

The sorts of things that can be achieved by the Semantic Web and not easily by any other means; is the creation of software defined infrastructure that can be used to supply the consumable resources required by AI Agents of various forms. Artificial Intelligence, or AI is fairly broadly considered to be a dirty term. I have argued similarly that the term ‘digital identity’ have similar problems, but lets set that aside for now.

There is a raw concept called ‘semantic inferencing’ that is built into the structural frameworks that are designed as semantic web. Another way this is described is as a ‘semantic reasoner’. Basically, what’s happening is that there’s an ability to harvest the semantic relationships between concepts described using ontologies that are able to be interwoven with definitions provided by other ontologies, and through this map a set of useful probability based ‘inferences’. These inferences can be then used to form formula that can help refine and improve probabilities (ie: via human interactions) and thereafter provide improved assumptions that can be used by other systems.

Whilst this then starts into a framework of very basic forms of machine-learning; the ingredients provided by these services incorporate the means to collect and collate a vast amount of knowledge from (online) accessible sources in a way that doesn’t require the software developer to independently collect and collate all of that knowledge themselves solely for their own thing.

Where this gets more interesting, is when it’s brought together with emergent DLT solutions (as does relate to DIDs); in addition to media analytics.

Whilst there will always be a need for specialised institutions to support focused efforts that task vast compute & human resources to solve a particular task; the semantic web is a critical set of tools that can humanise, or naturalise the web; and is the only known means to build solutions that ensure a persons ‘digital twin’ can be meaningfully defined and orchestrated by them.

With this in-mind, semantic web provides a means to link and federate queries from a multitude of data-sources using unified vocabularies that are mapped using ontologies and related tools; to permissions and schemas privately operated otherwise. This discrete point is likely more important than most might consider.

There have been public statements made about data being the new oil (economist); and there has been enormous investments made into ‘AI Infrastructure’, made available via a fee-based API. Whilst governments are moving to ensure ‘consumer data’ is able to be obtained from the major platforms, leading to projects like: https://datatransferproject.dev/ — the underlying reality is, that these services — don’t enable consumers to obtain or make useful, the underlying ‘AI’ data, that’s most often called ‘metadata’.

Therein, is a potentially useful inference for anyone who can link the data.

As is yet another reason why people don’t understand how instrumental semantic web is to their lives today, is that these enormous companies are using the technology to harvest as much as they can in a circumstance where the vast majority have little idea of what’s actually happening. A free, limitless photo storage service — provides a world of images to make use of as training data, to build AI services that are hard and time-consuming to make. As time marches on, as does the tech-debt.

International platforms of today hold the cards, like the printing press owners of the past. Perhaps the biggest shift, is the resolution of our lives stored on these platform services and the way it is made to be centrally governed worldwide. These services are being used to distribute fake news for money.

IF you want to make something that works better than the ‘status quo’, perhaps the next question might be whether or not there’s tools to support it.

The answer is yes.

The vast majority of the global platform infrastructure providers have semantic web products. This includes; SAP, Cray, Oracle, IBM, Microsoft (Inc. Microsoft Enterprise Graph), Nvidia, Amazon, Apache (Stanbol, Marmotta, Jena) and Adobe in addition to a healthy community of specialist providers that includes; OpenlinkSW, Top Quadrant, Franz, PoolParty, Wolfram, ontotext, Cambridge Semantics, BrightstarDB, BlazeGraph, MarkLogic, semiodesk, redlink who are amongst the many others providing specialist services, world-wide.

How we could use these tools to Naturalise the web.

As i’ve written about in my article about the human centric web, these tools allow us to re:engineer the relationship between any entities “digital & physical twin”, and the way they in-turn interact with knowledge services.

The role and opportunity to make use of the work; of those that include, Melvin Carvalho, Henry Story, Manu Sporny (and a small many others) to employ DLT technologies that can support the creation of permissive commons, shouldn’t be overlooked.

If we want a world where the discovery of knowledge is decentralised, knowledge discovery must be decoupled from the few gatekeepers of today.

(First Draft) Sponsor my works on via Patreon

**END**

--

--