Buzzwords, Language, and Information Access

Kingsley Uyi Idehen
OpenLink Software Blog
6 min readApr 29, 2016

Language is mankind’s greatest tool. It enables us to capture, create, and share data in a variety of contexts: information (data in the context of perception) and knowledge (data in the context of comprehension).

These days, in a world comprised of marketeer, end-user, domain-expert, and programmer profiles, it’s increasingly the norm that marketeers and programmers dominate the discourse that eventually leads to new — but frequently meaningless — terminology equipped with a low signal-to-noise ratio, commonly referred to as buzzwords.

Buzzword Problem

Today, we have an ever-increasing collection of buzzwords that swirl around the same ("simple") problem — access to networked information. Words intended to be meaningful in this conversation are bad enough, since — unlike terms — they don’t implicitly resolve the meaning of what they identify; imagine the destructive effect often-meaningless buzzwords can have on comprehension of a complex and challenging topic!

Buzzwords are blurring discovery, understanding, and exploitation of Open Standards-based efforts related to Information Access. For example, the Big Data, Data Lake, and Graph Database memes increasingly obscure practical applications of open standards such as SPARQL, RDF, Linked Open Data, and the notion of a Semantic Web.

  • Big Data meme — fundamentally about the evolution of data related challenges across dimensions such as volume, velocity, variety, and veracity (or verity).
  • Data Lake meme — fundamentally about dealing with the “Variety” challenge associated with the initial “Big Data” meme.
  • NoSQL meme — fundamentally about repudiating the notion (peddled by Relational Database Management System (RDBMS) vendors that SQL oriented Relational Tables are the sole basis for what constitutes a Relational Database Document and/or RDBMS application.
  • Graph Database meme — fundamentally about an alternative representation of entity relationship types (a/k/a relations) as entity → attribute (2-tuple; i.e., attribute name and attribute value are conflated) or entity → attribute → value (3-tuple; i.e., attribute name and attribute value are distinct parts or tuples) statements which can be represented in pictorial form using graph (or network) diagrams; as opposed to entity relationship represented as records in a table.

Big Data and Data Lakes are ultimately challenged the most by “Data Variety” at the levels of location and actual representation. Graph Databases are typically stuck with 2- or 3-tuple graphs, little to no use of HTTP URIs, and vendor-specific query languages instead of SPARQL.

The real problem described by all these memes boils down to the same quest for fluid and agile access to data, information, and knowledge, without document content formats, operating systems, computers, or network protocols functioning as silo vectors.

The Challenges of (Open Standards-based) Information Access

The rest of this post focuses on the issue of information access as a major challenge for which solutions — in the form of networks — have evolved over the years. By avoiding buzzwords, we can arrive at a clear understanding of what exists in the form of open standards for solving the fundamental challenges associated with information access.

DNS-based Computer Networks

Initial problem — Scattered Paper Documents containing useful information (data in the context of perception) were stored on distinct computers that weren’t connected by a network.

Source: http://www.governorsolutions.com/wp-content/uploads/2014/04/mps-papers-falling.png

Solution —A DNS-based Computer Network hosting Digital Documents made accessible across machines. These machines were identified by Canonical Names (CNAMES) courtesy of the DNS protocol. Net effect, CNAMES served as the canonical Data Source Name of relevance and focus, in regard to information access.

HTTP-based Document Networks

Initial problem — Access was needed to digitized documents across distinct computers on a DNS-based network (e.g., the public Internet or various private networks).

Solution — An HTTP-based Document Network (e.g., the World Wide Web or various private variants), where HyperLinks (commonly referred to as URLs) serve dually as Document Location Addresses (Locators) and the canonical Data Source Names (Identifiers) of relevance and focus.

Fundamentally, the structure of an HTTP URL provides abstraction that makes CNAMES irrelevant, with regard to information access.

The Linked Open Data Cloud

Initial problem — Access was needed to what was mentioned in digitized documents across an HTTP-based document network (such as the World Wide Web or a private network).

Solution — A Data Network, where HyperLinks (HTTP URIs — generic Names as opposed to Document Address-oriented URLs) function as Entity Names that provide the new Data Source Name of relevance and focus.

Fundamentally, the structure of an HTTP URI — including the “#” based fragment ID, which serves as an indexical [to anything; not just a section of an HTML document] — provides additional abstraction that makes Document Address/Location-based Names (i.e., HTTP URLs) irrelevant, with regards to information access.

A Semantically-Enhanced Data Network (a/k/a A Semantic Web)

Initial problem — Access was needed to each part of a sentence contained in a digitized document, across a Data Network (public Linked Open Data cloud or a private variant) .

Solution — A Semantically-enhanced Data Network, where the combination of HyperLinks (HTTP URIs; generic Names) and the abstract language of RDF facilitate construction of a sentence network or graph.

Basically, Hyperlinks provide signs (that function as Names) while the RDF Language provides a notation-agnostic abstract framework that enables systematic use of signs, syntax (the structure of subject predicate object), and semantics (the role meanings of subject, predicate, object) for constructing sentences.

Abstract RDF Language-based Sentence Representation using Entity-Attribute-Value (EAV) terminology

Abstract RDF Language-based Sentence Representation using Subject-Predicate-Object (SPO) terminology

In this kind of network, Hyperlinks are still the canonical Data Source Name of relevance and focus, but, if Linked Data principles are exploited, you have the added benefit of every entity name resolving to its own description document; in other words, the description of any part of a sentence is just one click away!

Conclusion

Buzzwords are killing us! Instead of increasing agility, they introduce inertia to efforts to understand technology and its impact on society. They bring opacity where transparency is desperately needed, by making terminology incomprehensible.

As terminology becomes more incomprehensible, it becomes increasingly difficult to build a knowledge-driven society, where individuals, enterprises, and governments alike are equipped with the tools and skills required for cost-effective and productive use of technology.

Bottom line, knowledge (data in the context of comprehension) is power, but it depends enormously on language, mankind’s greatest innovation — a tool for systematic use of signs, syntax, and semantics for encoding and decoding information (data in the context of perception).

Related

--

--

Kingsley Uyi Idehen
OpenLink Software Blog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.