Insight into Semantic Web and why is it important today

August 6th 1990, Tim Berners-Lee declared that his magnificent creation, Worldwide Web (WWW), is going public in a post on Usenet. From that moment, the whole perception toward information got revolutionized.

What made this digital web of servers so special was the fact that it had made the search for data way easier by linking many data sources using what we called Hypertext, creating a whole network in which we can look for information without worrying where the data is.

We typed the keywords, clicked on the search button and had a list of links to dive in.

Its greatest advantage is that it abstracts away this tedious machine layer.

A revolution took place and for that, this era was called “Information Era” and just like many researchers predicted, the amount of data became so huge that it was considered as a resource, and now, it’s the most valuable one.

— — — — — — — — Semantic Web — — — — — — — — —

In 2001, a new illuminating article was published in the Scientific American. It was written by the same Tim Berners-Lee, James Hendler and Ora Lassila. It was about an old vision that the godfather of WorldWide Web had presented in 1994 at the first WWW Conference:

The Semantic Web

“A new form of web content that is meaningful to computers will unleash a revolution of new possibilities” — Tim Berners-Lee –

The idea came as a solution to the computers’ limitations. A Youtube video explained the fact that machines didn’t understand the content of the web. They knew the existence of a link but they didn’t understand the relationship of this link with the data in each web page.

They didn’t get what we call now “Knowledge”.

With the rise of the data, the surf into links became so overwhelming that researchers tended to look for a way to make the search go from documents to knowledge.

The greatest advantage of the Semantic Web is that it abstracts away the tedious documents and application layer to have a straight access to knowledge.

Semantic Web is an extension is an extension of the World Wide Web through standards by the World Wide Web Consortium (W3C). The standards promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF). According to the W3C, “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries”. The Semantic Web is therefore regarded as an integrator across different content, information applications and systems.”

— — — — — Architecture of Semantic Web — — — — —

The schema below shows the architecture of Semantic Web as a cake layer and its different components.

Semantic Web Stack

The Semantic Web stack represents:

  • XML provides an elemental syntax for content structure within documents yet associates no semantics with the meaning of the content contained within. XML is not at present a necessary component of Semantic Web technologies in most cases, as alternative syntaxes exists, such as Turtle. Turtle is a de facto standard but has not been through a formal standardization process.
  • XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents.
  • RDF is a simple language for expressing data models, which refer to objects (“web resources”) and their relationships. An RDF-based model can be represented in a variety of syntaxes, e.g., RDF/XML, N3, Turtle, and RDFa. RDF is a fundamental standard of the Semantic Web
  • RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.
  • OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. “exactly one”), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
  • SPARQL is a protocol and query language for semantic web data sources.
  • RIF is the W3C Rule Interchange Format. It’s an XML language for expressing Web rules that computers can execute. RIF provides multiple versions, called dialects. It includes a RIF Basic Logic Dialect (RIF-BLD) and RIF Production Rules Dialect (RIF PRD).

[Some technologies aren’t realized yet like Cryptography, Trust (proves how trusted the source of information is) and the UI for Semantic Web applications]

— — — — — — — — — Linked Data — — — — — — — — —

In 2009 Ted Talk show, Tim Berners-Lee focused on Linked Data, a method for publishing and sharing data over the Internet like it was an open distributed database of information.

The problem of Linked Data consisted of:

- The packaging problem: What was the best way to express the data on the web and there, it existed many options: HTML / JSON / XML / CSV / RDFa

- The linking technique: How do we link this data together.

The schema below shows how we can go from different data formats, transform and integrate the data into LD Data Sets to have an integrated Data Set that accessed through a special query language and protocol for RDF data model in the net: SPARQL

Linked Data Architecture

Examples of Semantic Knowledge DB and other related technologies :

  • One of the most known semantic knowledge bases is Google Knowledge Graph which consists of nodes (subjects) connected to each other through the edges (relationships), of course, semantically.
Example of Google Knowledge Box
  • Freebase is a project bought by Google which represents a database of structured data from the web. Its service was shut down by Google in August 2016.
  • Facebook Open graph protocol is a protocol created by Facebook inspired by Dublin Core, Microformats and RDFa. According to the official website, “it allows every web page to become an object enriched in a Social Graph which is “a graph that depicts personal relations of internet users. In short, it is a model or representation of a social network, where the word graph has been taken from graph theory. The social graph has been referred to as “the global mapping of everybody and how they’re related”.
  • Dbpedia is a Wikipedia public data infrastructure for a large, multilingual, Semantic knowledge graph. It was a project in which the content of Wikipedia article was made into csv format to have a better access to the knowledge and be able to integrate it into a knowledge graph.
Knowledge Graph Architecture

— — — — — Enterprise Knowledge Graph — — — — —

Inspired from Google and other knowledge graphs, an enterprise knowledge graph (EKG) is an integrated enterprise-wide graph created by an organization that will serve as its centralized source of integrated knowledge and inference.

The movement to EKGs has been accelerated in the last months by two new developments. One is the addition of the Neptune graph database to Amazon’s database portfolio. The second is the funding of both cloud and on-premise graph systems like TigerGraph and other Bay Area startups. Many of the graph-architects from Google, LinkedIn and Facebook (all using graph databases) are now venturing out on their own to develop solutions for the enterprise.

EKGs still present an interesting set of problems they have to solve to truly be considered Enterprise-class. Adding transactions, bitemporal versioning, role-based access control, audit, scalability and high-availability are all problems that traditional open-source proof-of-concept systems struggle with. Academics tend to not be interested in these real-word problems.

Enterprise Knowledge Graph exemple