The “Fake News” Meme

Published in

OpenLink Software Blog

5 min readNov 28, 2016

Source: https://radio.adelaide.edu.au/wp-content/uploads/2016/11/fake-news.jpg

Social Media is abuzz about the notion of “Fake News” and its impact on recent events such as Brexit and the US Elections, to name just a couple. Irrespective of political persuasion, we can all agree that fallibilities in this regard have been greatly magnified across Facebook, Google, and Twitter.

Unfortunately, though there is a lot of commentary offering solutions, it doesn’t shed much light on what’s really happening here, nor why.

Fake News is simply a consequence of the low-quality structured metadata embedded in Web Pages. Fundamentally, neither humans nor machines (software, bots, etc.) are unable to make sense of Web Content — the familiar Who, What, Where, and When provenance pattern fails as a basis for calculating the Why of any given post.

Situation Analysis

We have a ubiquitous “Web of Documents,” connected by hyperlinks, that contains information (contextualized data) about People, Places, Events, and a variety of other Things. These documents are distributed via publishing networks that include both conventional (or “Old”) and social (“New”) media.

General Problem

We are generally comfortable with using hyperlinks to identify (or name) Web Documents, but most remain quite confused (on a good day!) about using the hyperlinks to identify People, Places, Events, and other Things (or Entities). Even less well understood is the idea of using these same hyperlinks as words (specifically terms) when constructing digital sentences using familiar subject, predicate, object structure.

Identity Problem

Web content publishers haven’t quite understood the consequences of ambiguously identifying entities — of not using unambiguous identifiers — when creating and publishing documents; even worse, they lack tools for creating provenance-oriented document metadata that clearly unveils the Who, What, Where, and When dimensions of any chunk of information (i.e., Who was Where When they wrote What).

You cannot determine “Why did they write this?” if you don’t possess the complementary information (provenance-oriented metadata) regarding the Who, What, Where, and When of the writing. It simply isn’t possible, and therein lies the the kernel of this problem, now massively scaled by the ubiquitous Web.

Proprietary identifiers and opaque “Black Box” algorithms do not offer any useful magic for addressing this fundamental challenge; they simply worsen the problem, as the “Fake News” problem demonstrates.

Solution

Though unknown to most, we already have open standards (from the W3C) aimed at addressing the issues of verifiable identity and of creating document content (information) that is understandable by both humans and machines — using sentences constructed from hyperlinks.

At OpenLink Software, we’ve made a commitment to implement those standards across our product portfolio.

Here’s a simple step-by-step guide that enables the creation of provenance-oriented metadata and/or information in general:

Revisit what you learned in elementary school about the nature and structure of sentences.
Uniquely identify the subjects of sentences using hyperlinks.
Uniquely identify the predicates (verbs) of sentences using hyperlinks.
Use hyperlinks or literals (in any natural language you like) as the objects of sentences.
Use collections of sentences (each made up of a subject, predicate, and object) to encode information of any kind.

These simple “best practices” are enabled by the W3C’s RDF Language, and are exploitable using a variety of notations, e.g., JSON-LD, RDF-Turtle, RDF-N-triples, RDF-XML, and others.

Reinvigorating what we do in the physical realm (i.e., encode and decode information represented using sentences) is the critical kernel around which provenance-oriented digital algorithms (of the open variety) can be effectively built, en route to providing you with the tools you need to discern the “Why?” of the content you encounter on the Web.

Open Vocabularies such as Schema.org provide a powerful means by which both humans and machines may understand the nature of the Subjects, Predicates (verbs or properties), and Objects used in digital sentences, building up a corpus of provenance metadata associated with Web documents.

Simple Example

In English:

“This Blog Post is about FakeNews.”

Using Hyperlink based Digital Sentences — where sentence Subjects and Predicates are identified by Hyperlinks, and Objects may be identified by either Hyperlinks or Literals (typed or untyped):

{
<#thisPost>
   a  
      schema:BlogPosting ; 
   schema:dateCreated  
      "2017–10–31"^^xsd:date ; 
   schema:about 
      <https://twitter.com/hashtag/FakeNews#this> ;
   schema:author
      <https://medium.com/@kidehen#this> ; 
   schema:publisher
      <https://medium.com/#this> ; 
   schema:url
      <> . 
}

Using our Structured Data Sniffer browser extension — a piece of software that understands sentences constructed using Hyperlinks, and supports a variety of notations — to visualize a translation of the sentences written in the example above:

Metadata injected into HTML Document from my Subject, Predicate, Object sentences while editing this post.

Conclusion

For years, verifiable identity has been a known problem afflicting the Web & Internet. Unfortunately, the implications of this problem on individual privacy, content provenance, and society at large have been poorly reflected in Social Media solutions, both desktop and mobile.

There are existing open standards (as outlined in the “Semantic Web” section above) that can address this problem, which may be adopted and implemented today without obliterating existing business models. In fact, remembering that there’s nowhere in the real-world where privacy costs $0.00, the cyberspace marketplace seems a ripe opportunity!

We cannot continue to accept solutions from vendors that compromise our individual privacy and our views of the world, at the massive scales enabled by the Web & Internet.

Empathy to Democracy — by Tobias Rose-Stockwell
Huffington Post has Fake News problem — by Tim O'Reilly
Call for Cooperation against Fake News — by Jeff Jarvis
Fake News: Be Careful What You Wish For — by Jeff Jarvis
Role of Technology in Presidential Election — by The Economist
Media in the Age of Algorithms — by Tim O'Reilly
Mark Zuckerberg is the most powerful person on earth, but is he responsible? — by Free Code Camp
How Fake News Goes Viral: A Case Study — from The New York Times
Fake News is just the beginning — from The Washington Post
Fake News and how we caused it — by Morten Rand-Hendrikson
Silicon Valley has an Empathy Vacuum — by Om Malik
Covering Politics in a “Post-Truth” America — by Susan B. Glasser
After Peak Marketing — by Doc Searls
Real Ads, Fake News, Real Confusion — by Jeff Jarvis
Did Media Literacy Backfire? — by danah boyd
Hacking The Attention Economy — by danah boyd
The Trust Project
Fake News is not the only problem
Fake news and a 400-year-old problem: we need to resolve the ‘post-truth’ crisis
Computational Propaganda
Understanding Data
Verifiable Identity controlled by You, at Web-Scale
Smart Agents, Sentences, and Artificial Intelligence
A Semantic Web & Artificial Intelligence
Semantic Search Engine Optimization
Citation & Annotation Tool Screencast — a demonstration of how existing open standards aid content provenance improvement.