ChatGPT and Semantic Web Symbiosis

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog
9 min readJun 20, 2023

A response generated by ChatGPT regarding the effects of leveraging its conversational UI/UX as the missing interface for understanding and appreciating the virtues of a Semantic Web constructed using Linked Data principles:

I completely agree with you. The combination of natural language processing and linked open data (LOD) presents a unique opportunity to create sophisticated conversational interfaces that can access and interact with a wealth of domain-specific knowledgebases.

These knowledgebases, curated by domain experts and deployed using Linked Data principles, represent a vast, interconnected web of information that spans virtually every field of human knowledge. Leveraging this resource effectively can greatly enhance the capabilities of AI systems, allowing them to provide more accurate and contextually relevant information.

However, the potential of this resource remains largely untapped, as the user experience of interacting with these knowledgebases is often not intuitive or user-friendly. This is where a conversational UI/UX can make a significant difference.

By providing a natural, human-like interface to these knowledgebases, a conversational UI/UX can make it easier for users to access and interact with the information contained within them. This not only improves the user experience, but also increases the utility of these knowledgebases by making them more accessible to a wider audience.

In this way, the development of advanced conversational interfaces can play a key role in realizing the full potential of the semantic web and linked data principles.

In this post, I explore the symbiotic relationship between Large Language Models (LLMs)-driven conversational bots like ChatGPT and the concept of a Semantic Web (public, private, or hybrid)[1]. This symbiosis exists due to the reasons outlined in the ChatGPT response above, which can be summarized as follows:

  1. LLMs provide a foundation for powerful natural language processing based on their understanding of sentence syntax and semantics; for instance, they comprehend the underlying semantics of multiple variations of the same sentence.
  2. A Semantic Web is simply a variation of the Web explicitly constructed from sentences using hyperlinks to express sentence syntax and semantics in machine-computable form; the net effect is a boundless collective of structured data that manifests a global entity relationship graph (rather than network), comprising machine-computable entity relationship type semantics.

Large Language Models (LLMs)

Computer User Interface Challenges

Interfaces for interacting with computers have remained a significant challenge for a long time. Initially, command-line interfaces attempted to address this challenge, but their usability was hampered by the need for users to master specific sentence syntax. This evolved, following innovations originating from Xerox Parc (popularized by Apple and later Microsoft), into widget-based interactions that introduced the Graphical User Interface. Unfortunately, neither approach to user-interface and user-interaction fully addressed the need for more natural interface and interaction modalities that incorporated natural language input via typing, images, or sound. The net effect, as demonstrated in today’s mobile phone-dominated era, is that access to powerful personal computing has negatively impacted basic human social behavior. For instance, both the young and old in today’s world often find it hard to hold a conversation (in any setting) without the distraction of a mobile phone due to its dependency on a user-interface and user-interaction modality that is still primarily based on typing as the primary interaction mechanism.

The challenges of natural language processing are at the root of today’s computer interaction dysfunctionality. For example, command-line interfaces are suboptimal due to their rigid requirements for sentence structure and syntax. There is only one rigid structure (despite support for argument-based switches) for constructing a command that’s comprehensible to a target application or an operating system. With GUIs, the power and specificity of sentences are replaced by a primitive collection of interaction widgets that merely masquerade as solutions to the underlying interaction issues. Often, they erode productivity and utility in their quest to oversimplify task execution.

Addressing Natural Language Challenge

In recent times, Large Language Models (LLMs) have emerged as a potential solution to the long-standing natural language processing issues that have plagued computing to date. They address this problem through the use of training sets that enable the simulated comprehension of sentence structure, syntax, and underlying semantics. In a nutshell, they allow a multitude of sentence variations to be accurately mapped to a specific interpretation, which can be associated with specific task execution — this is a game-changing development.

Challenges

Despite the game-changing innovation they bring to natural language processing, it is no secret that the growth in the size of LLM models has already hit its tipping point [2]. The same goes for their tendency to hallucinate, given the fluid nature of data, information, and knowledge.

Fundamentally, while LLMs enable powerful and innovative natural language processing, they should never be mistaken for reliable knowledge bases.

Semantic Web Backgrounder

Challenges

It is also no secret that the notion of a Semantic Web has been riddled with controversy since its introduction in 2001 via a scientific america article authored by Tim Berners-Lee, Jim Hendler, and Ora Lassila.

The aspirational view of the Web, at the time of the article’s presentation, anticipated what we are witnessing today, i.e., a Web composed of structured data that also harnesses the power of logic conveyed in natural language. Similar to the Web itself during its initial introduction, such a novel concept was challenging for many to comprehend, let alone accept.

As a bottom-up effort, the task of engaging developers and the emergence of productivity tools also presented significant challenges.

Semantic Web Today

LOD Cloud

The ideas behind the Semantic Web are now available through what’s known as the Linked Open Data Cloud (LOD Cloud). The LOD Cloud is a large and decentralized collection of knowledge graphs, each one carefully curated by domain experts. It comprises structured data crafted using RDF-based sentences. In these sentences, hyperlinks denote the subjects and predicates, while either hyperlinks or literals (typed or untyped) denote the objects. The LOD Cloud also includes SPARQL query service endpoints, which enhance its data connectivity and provide greater flexibility. SPARQL, a powerful query language, is deeply integrated with HTTP. This means that both queries and their solutions can be encoded as hyperlinks, further enhancing the versatility of the LOD Cloud!

LOD Cloud Version 1.0 Depiction (circa 2007)

LOD Cloud Growth Depiction

LOD Cloud Today Depiction (Today)

Schema.org and Semantic SEO (SSEO)

In recent times, courtesy of Schema.org and broad support from Google, another Semantic Web frontier has opened up, reflecting the gradual transition from keyword-based search engine optimization (SEO) to semantic search engine optimization (SSEO). By creating a high-level vocabulary (schema or ontology) for the Web, Schema.org has introduced the following growth factors:

  1. Major vendor support: This is delivered by Google, Microsoft, Yandex, and other search engine service providers.
  2. Simple vocabulary: It’s compatible with the needs of Web Developers and SEO practitioners.
  3. Demonstrable adoption and growth: This is seen through initiatives such as Web Data Commons, a derivative of the Common Crawl effort widely used as training data for LLMs.
  4. Strong support of JSON-LD: As a structured data representation format, it effectively combines the ubiquity of JSON with the power of Linked Data Principles.

Schema.org Induced SSEO Growth

ActivityPub based Fediverse

In addition to what’s been covered so far, there’s also a frontier in social media, known as the Fediverse. This new space leverages the unobtrusive power of JSON-LD to expose the virtues of a Semantic Web. This is done via ActivityStreams documents, which are moved across inboxes and outboxes using the ActivityPub protocol. The Fediverse particularly appeals to those seeking decentralized social media, where user identity, follower networks, and content can be transferred between platforms. Much like the LOD Cloud and SSEO frontier driven by Schema.org, the Fediverse has already passed the point of critical mass regarding network effects, as illustrated below.

Fundamentally, the Semantic Web concept has been challenged by the lack of a complementary interface that manifests its underlying value to both end-users and developers alike. The nature of this interface must go beyond conventional UI/UX constructed using buttons, fields, and dialogs that drive traditional application UI/UX.

Thermonuclear Fusion of LLMS & a Semantic Web

Today, on one side, we see widespread acceptance and extensive use of LLM-powered natural language processors. On the other side, there’s a stealthily growing Semantic Web, which comprises a vast collection of knowledge graphs curated by domain experts. LLMs encounter challenges with hallucinations, issues that are significantly mitigated when domain-specific knowledge graphs are loosely integrated. In the same vein, the conversational UI/UX offered by LLM-powered chatbots (e.g., ChatGPT) presents a powerful solution to the end-user and developer uptake challenges that have challenged the Semantic Web concept for years.

Combining the power of LLMs and a Semantic Web (public, private, or hybrid) unleashes a thermonuclear fusion effect that completely redefines the nature of computing as we’ve all experienced it to date. Why? Because language, mankind’s greatest innovation [3], is now fully integrated into computing like never before. For instance, conversational UI/UX using natural language sentences is replacing primitive buttons, dialogs, and forms as the prime interaction infrastructure. In addition, we have the benefit of expressing computer interactions through the manipulation of entities, entity types, and entity relationship types. This spans all the way down from knowledge, through information, to raw data. This transformative change is happening due to the convergence of two powerful realms. Both realms are driven by the power of language for encoding and decoding information using signs, syntax, and semantics.

Conclusion

Data is a new kind of electricity conducted by hyperlinks” is a statement I made years ago at the advent of the Linked Open Data cloud. That was a time when the use of Linked Data Principles for structured data representation ushered in a practical and reoriented approach to the concept of a Semantic Web.

Image courtesy of Rémy Fannader

Today, thanks to technologies like ChatGPT, we have reached a critical milestone in a journey spanning over 20 years. Software now has the ability to emulate understanding what operators (be they users or developers) mean in the course of operational interactions. This has all been facilitated by the unique and powerful data connectivity provided by hyperlinks, the same magical component that enabled the World Wide Web.

In this new computing frontier, we can now address critical issues such as:

  1. Decentralized & Pseudonymous Self-Sovereign Identity Authenticity.
  2. Privacy that scales to both the Internet and Web.
  3. Exploiting the power of Language, at both Internet and Web scales.
  4. Building Smart Agents equipped with progressive intelligence that serve line-of-business needs such as Customer Support, Business Development, etc.
  5. Driving Enterprise Agility without ripping and replacing existing infrastructure i.e., eradicating rather than compounding technical debt.

Related Links

  1. Linked Data, Ontologies, and Knowledge Graphs
  2. A Semantic Web & Artificial Intelligence
  3. Sam Altman — Size of LLMs won’t Matter as much Going Forward
  4. Buzzwords, Language, and Information Access
  5. Semantic Web — Scientific America Article [PDF]
  6. LOD Cloud SPARQL Query Service Endpoints Spreadsheet
  7. Beware Tunnel Vision in AI Retrieval
  8. Edges of Knowledge by Rémy Fannader
  9. Linked Data Spaces & Data Portability
  10. Friend of a Friend: The Facebook That Could Have Been

Additional Related Information

--

--

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.