The “Fake News” Meme

Kingsley Uyi Idehen
OpenLink Software Blog
5 min readNov 28, 2016

--

Source: https://radio.adelaide.edu.au/wp-content/uploads/2016/11/fake-news.jpg

Social Media is abuzz about the notion of “Fake News” and its impact on recent events such as Brexit and the US Elections, to name just a couple. Irrespective of political persuasion, we can all agree that fallibilities in this regard have been greatly magnified across Facebook, Google, and Twitter.

Unfortunately, though there is a lot of commentary offering solutions, it doesn’t shed much light on what’s really happening here, nor why.

Fake News is simply a consequence of the low-quality structured metadata embedded in Web Pages. Fundamentally, neither humans nor machines (software, bots, etc.) are unable to make sense of Web Content — the familiar Who, What, Where, and When provenance pattern fails as a basis for calculating the Why of any given post.

Situation Analysis

We have a ubiquitous “Web of Documents,” connected by hyperlinks, that contains information (contextualized data) about People, Places, Events, and a variety of other Things. These documents are distributed via publishing networks that include both conventional (or “Old”) and social (“New”) media.

General Problem

We are generally comfortable with using hyperlinks to identify (or name) Web Documents, but most remain quite confused (on a good day!) about using the hyperlinks to identify People, Places, Events, and other Things (or Entities). Even less well understood is the idea of using these same hyperlinks as words (specifically terms) when constructing digital sentences using familiar subject, predicate, object structure.

Identity Problem

Web content publishers haven’t quite understood the consequences of ambiguously identifying entities — of not using unambiguous identifiers — when creating and publishing documents; even worse, they lack tools for creating provenance-oriented document metadata that clearly unveils the Who, What, Where, and When dimensions of any chunk of information (i.e., Who was Where When they wrote What).

You cannot determine “Why did they write this?” if you don’t possess the complementary information (provenance-oriented metadata) regarding the Who, What, Where, and When of the writing. It simply isn’t possible, and therein lies the the kernel of this problem, now massively scaled by the ubiquitous Web.

Proprietary identifiers and opaque “Black Box” algorithms do not offer any useful magic for addressing this fundamental challenge; they simply worsen the problem, as the “Fake News” problem demonstrates.

Solution

Though unknown to most, we already have open standards (from the W3C) aimed at addressing the issues of verifiable identity and of creating document content (information) that is understandable by both humans and machines — using sentences constructed from hyperlinks.

At OpenLink Software, we’ve made a commitment to implement those standards across our product portfolio.

Here’s a simple step-by-step guide that enables the creation of provenance-oriented metadata and/or information in general:

  1. Revisit what you learned in elementary school about the nature and structure of sentences.
  2. Uniquely identify the subjects of sentences using hyperlinks.
  3. Uniquely identify the predicates (verbs) of sentences using hyperlinks.
  4. Use hyperlinks or literals (in any natural language you like) as the objects of sentences.
  5. Use collections of sentences (each made up of a subject, predicate, and object) to encode information of any kind.

These simple “best practices” are enabled by the W3C’s RDF Language, and are exploitable using a variety of notations, e.g., JSON-LD, RDF-Turtle, RDF-N-triples, RDF-XML, and others.

Reinvigorating what we do in the physical realm (i.e., encode and decode information represented using sentences) is the critical kernel around which provenance-oriented digital algorithms (of the open variety) can be effectively built, en route to providing you with the tools you need to discern the “Why?” of the content you encounter on the Web.

Open Vocabularies such as Schema.org provide a powerful means by which both humans and machines may understand the nature of the Subjects, Predicates (verbs or properties), and Objects used in digital sentences, building up a corpus of provenance metadata associated with Web documents.

Simple Example

In English:

“This Blog Post is about FakeNews.”

Using Hyperlink based Digital Sentences — where sentence Subjects and Predicates are identified by Hyperlinks, and Objects may be identified by either Hyperlinks or Literals (typed or untyped):

{
<#thisPost>
a
schema:BlogPosting ;
schema:dateCreated
"2017–10–31"^^xsd:date ;
schema:about
<https://twitter.com/hashtag/FakeNews#this> ;
schema:author
<https://medium.com/@kidehen#this> ;
schema:publisher
<https://medium.com/#this> ;
schema:url
<> .
}

Using our Structured Data Sniffer browser extension — a piece of software that understands sentences constructed using Hyperlinks, and supports a variety of notations — to visualize a translation of the sentences written in the example above:

Metadata injected into HTML Document from my Subject, Predicate, Object sentences while editing this post.

Conclusion

For years, verifiable identity has been a known problem afflicting the Web & Internet. Unfortunately, the implications of this problem on individual privacy, content provenance, and society at large have been poorly reflected in Social Media solutions, both desktop and mobile.

There are existing open standards (as outlined in the “Semantic Web” section above) that can address this problem, which may be adopted and implemented today without obliterating existing business models. In fact, remembering that there’s nowhere in the real-world where privacy costs $0.00, the cyberspace marketplace seems a ripe opportunity!

We cannot continue to accept solutions from vendors that compromise our individual privacy and our views of the world, at the massive scales enabled by the Web & Internet.

Related

  1. Empathy to Democracy — by
  2. Huffington Post has Fake News problem — by
  3. Call for Cooperation against Fake News — by
  4. Fake News: Be Careful What You Wish For — by
  5. Role of Technology in Presidential Election — by
  6. Media in the Age of Algorithms — by
  7. Mark Zuckerberg is the most powerful person on earth, but is he responsible? — by
  8. How Fake News Goes Viral: A Case Study — from The New York Times
  9. Fake News is just the beginning — from The Washington Post
  10. Fake News and how we caused it — by Morten Rand-Hendrikson
  11. Silicon Valley has an Empathy Vacuum — by
  12. Covering Politics in a “Post-Truth” America — by Susan B. Glasser
  13. After Peak Marketing — by
  14. Real Ads, Fake News, Real Confusion — by
  15. Did Media Literacy Backfire? — by
  16. Hacking The Attention Economy — by
  17. The Trust Project
  18. Fake News is not the only problem
  19. Fake news and a 400-year-old problem: we need to resolve the ‘post-truth’ crisis
  20. Computational Propaganda
  21. Understanding Data
  22. Verifiable Identity controlled by You, at Web-Scale
  23. Smart Agents, Sentences, and Artificial Intelligence
  24. A Semantic Web & Artificial Intelligence
  25. Semantic Search Engine Optimization
  26. Citation & Annotation Tool Screencast — a demonstration of how existing open standards aid content provenance improvement.

--

--

Kingsley Uyi Idehen
OpenLink Software Blog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.