With all the excitement around Deep Learning and Blockchain it is easy to miss the outlook for other “enabling technologies”, especially those ignored or forgotten by industry for being too academic or niche. Gartner recently published its Hype Cycle for Emerging Technologies 2018, and just days ago Tim Berners-Lee announced his world-dominating plan for decentralizing the web.
It may not be immediately obvious but both those stories have one thing in common: the Semantic Web.
Both knowledge graphs (from Gartner’s report) and RDF (from said world-domination plan’s Solid project) are integral to the Semantic Web vision and technology stack. This is why experts disagree any part of that is still a hype or that it’s actually “dead” because, while the ultimate vision may not have materialized yet, its byproducts are still very much in action.
Even the idea for web decentralization might have been silently brewing in the linked data principles, which can be thought of as an ideological framework to RDF.
Perhaps the time is ripe to take one small step back for the web, and look at where the grand vision for it has been all this time. We will lazily rely on Gartner’s Hype Cycle graphs (pun not intended) throughout the years since the vision’s inception, coupled with my personal academic research and industry application experience, plus much wisdom gleaned from the web itself, to present a short overview of the not-so-brief history of the Semantic Web.
At time t=0 we have TBL’s famous announcement of his vision for an intelligent web in a 2001 Scientific American article.
In that same year, Gartner timely introduced the term Semantic Web as a new entry in its Emerging Technologies report, predicting it would be industrially productive in 5–10 years. Like many technological milestones, the foundations were already present at the time, owing to years of research in symbolic artificial intelligence (vs. today’s probabilistic AI) that gave us the expert systems and knowledge bases of the 80’s.
Aside from XML, the abstract data modelling language RDF was specified in 1999 (that’s almost 20 years to date). The knowledge representation and reasoning (KRR) specialization of AI set the precedent for describing ontologies (think intelligent schema), with the simple RDF-S and advanced OWL ontology languages being introduced around the same time.
Gaining some traction in about 4–5 years, we see mention of a Corporate Semantic Web in 2005 and 2006. This is right about the time when the RDF query language SPARQL and procedural rules language SWRL (complementing OWL’s declarative rules) were being drafted.
RDF graph databases or triplestores such as Franz’s commercial AllegroGraph and Kingsley Uyi Idehen-led open-source Virtuoso were coming on to the scene. Dave Beckett, a key player in the field, had already established his C RDF libraries and developers were using them in interesting ways.
Although there is no Gartner report to show for it, the period 2006 to 2010 marks the most exciting time yet for semantic (web) technologies. Peak of inflated expectations of the vision, perhaps?
Freebase was in full swing, and would later go on to be acquired by Google and become the foundation for its very own proprietary Knowledge Graph. SPARQL v1.1,OWL v2 and SPIN were introduced to provide much needed RDF querying flexibility, ontology intelligence and reasoning capabilities.
The years also saw the publication of three important books in the field: my personal favourite Foundations of Semantic Web Technologies, Semantic Web for the Working Ontologist, and Programing the Semantic Web (one of the few using Python, at a time when the field was dominated mostly by Java and still is).
Universities were also integrating key topics into their degree programmes, and professional institutes started offering certifications. In other words, quite a lot was happening relatively, and this was reflected in the activities of both academia and industry.
Post-2010 is a different story altogether, and one that involves some disappointment. Software tools became abandoned, startups grew quiet, and organizations stopped publishing RDF (meta)data in favour of JSON. The Semantic Web idea started to receive some hate, surprisingly from direct contributors of the community, like Manu Sporny.
Gartner rightly placed Semantic Web approaching the trough of disillusionment in 2012, the eventual inevitable dip proven by Linked Data being at the bottom of the trough in 2015.
Some good things did come out of this, though: the multi-industry schema.org vocabulary project for embeddable metadata (what you’d know as rich snippets or semantic SEO), the RDF serialization (or alternative, if that’s your stance) JSON-LD, the truly OWL2-compliant triplestore Stardog, and the book Learning SPARQL (Bob DuCharme’s quite the blogger, by the way).
The practicality of a Semantic Web still gets some bad rep today as having a depressing situation due to a steep learning curve. Nothing could be farther from the truth — RDF is dead simple, particularly in its Turtle form. Any non-trivial application, no matter the technology, is going to have a learning curve.
The last several years of disillusionment appears to have given rise to pervasive misconceptions and misunderstandings, but all is not lost.
Many organizations may have abandoned XML and RDF publishing, but those for whom data is a high priority have also prioritized these standards. The European Union (EU) is a large and complex multi-party governing body, and it is evident from recent initiatives that they have been successfully exploiting such technologies.
With the help of experts and the community, TopQuadrant grew SPIN into SHACL, which is supposed to supersede OWL’s declarative rules system and (lack of) data validation. The problem of semantic data at scale also got a candidate solution in the form of Linked Data Fragments.
Even graph databases of the alternative LPG kind treat RDF as a first-class citizen, and promise to give us the semantics we need without the complexities or shortcomings of declarative languages and description logics, such as monotonicity and the open-world-assumption.
So, is that all? Not in the least. The trendy technologies of the last two years frequently championed by startup programmes and the media today are quite supplemental to the Semantic Web. Machine learning is a useful technique for supplying probabilistic data for reasoning and inference, and a decentralized ledger system could benefit linked data, or vice-versa.
TL;DR: The Semantic Web is an ideal, not a killer technology by itself. There is no one killer application either, but many past, present and future innovations. It exists through the standards and technologies that are the culmination of various focused initiatives. In other words, in 2018, the Semantic Web is right where we need it to be.