Provenance Is Not A Region In France

Published in

The Cagle Report

7 min readMay 18, 2017

How do you know what’s true? This seemingly simple question underpins a surprisingly large number of issues that we as information producers and consumers face every day. From the political sphere, memes and alt-facts have become commonplace, threatening to engulf us all in a morass of disinformation. Data necessary for finance and business analytics goes through dozens or even hundreds of points, and the attempt to mine data from social media introduces biases and questions about how valid such data really is. Even in very carefully controlled environments, the question of where information comes from should be central to anyone working with data, though all too often isn’t.

Provenance is a term that in data circles means the specific history of a given piece of information. In the age before computers, such provenance was typically relegated to academic works. A bibliographic citation in a book would indicate where a particular assertion, table or quotation came from. If you retrieved the particular work being referenced, then you could look up the source of information and the context in which it was used. Such a cited work itself may have its own citations, pointing to earlier works. In effect, citations created a web of trust — at any point you could locate all of the previous works, and as importantly, could evaluate the source of these works for their validity.

As such, a provenance trail established the degree of authority that a work had. It didn’t necessarily guarantee truthfulness — an author is simply someone who creates a given work — but because of the complexities and time consuming nature of publishing, it was more likely that there had been a peer review process to insure that the work bears some reasonable semblance to what really happened. The rise of authority is critical, because in general we are reliant upon others to provide information that we could not ourselves obtain — we weren’t there at the time, we weren’t witnesses. An untrustworthy witnesses throws into doubt the whole provenance chain.

The English legal system is built upon authoritative testimony, attesting both to the significance of circumstantial evidence, the perceptions of witnesses at the time and character assessments on the part of those who were involved with one or the other of the plaintiffs or defendants. Ledgers developed as a way of recording transactions, not just for managing current accounts but also to provide an authoritative record of previous transactions. It is one reason that accountants are held to such high levels of trust (and are penalized so heavily if they fail in their duties to properly record such transactions) — they are authoritative witnesses.

News organizations, before they became entertainment, were judged on their reliability and their adherence to the truth. A yellow newspaper (a term that originated from the fact that the cheap newsprint used for publication would yellow more quickly because of the high acid content in the paper) was one that made up stories that had no authoritativeness, no legitimacy, and were often sensational, salacious and quite frequently not only crossed the line into libel, but routinely romped and stamped on that line to blur it out of all recognizability.

The rise of television produced a new type of news. A newspaper had a comparatively high bandwith — any single issue may have dozens of stories and was served daily, but any such news was always, implicitly a day old, and was only open to those who were proficient readers. Radio (and later television) news, on the other hand had a much lower bandwidth — you could get perhaps a dozen articles tops in a news program. However, human beings are narrative creatures. A person speaking a story was compelling in ways that print could never be, especially if that person was also seen speaking. Radio and television news activated our hindbrains, the part of the brain that bypassed the filter of rational thought and went straight into memories.

There were giants in the radio and television field — Edward R. Murrow and Walter Cronkite are two that come to mind — who championed the need for provenance in their work, but there were also others who saw the potential for social engineering on a massive scale by bypassing this provenance trail through the new media. It is perhaps not surprising that in the current culture wars, those most likely to hold beliefs without substantiation are also those that were most heavily reliant upon radio and television for their news.

The Internet changes that equation. The key aspect of the Internet is the hyperlink. A hyperlink — the underlined blue text you see in a browser that lets you “click” to a different “page” — is in fact a form of citation. This makes it possible to better determine the provenance of an article, though it doesn’t necessarily insure the truthiness of an article’s source. However, what it does enable you to do is to aggregate sources and determine, based upon contextual records, how likely that domain reflects known facts. It also becomes possible to crowd-source the process of determining the legitimacy of a domain as an authority. Snopes and Politifact have built their business models on providing ratings about the trustworthiness of a given assertion or political record, and other are emerging as the market for legitimacy gains steam.

What such rating systems do is provide a mechanism for quantifying trust. This mechanism requires a combination of several factors: crowd-sourcing reliability metrics so that biases can be more readily identified and factored out, establishing a base-line history of news providers to determine the degree to which they change over time (due to changing editorial policies or ownership), and analyzing this in light of where the consensus is for a given piece of information.

This is not “news” in the traditional sense, more along the lines of “pre-news”, determining the authority of a given source to better provide that information in a relatively unbiased manner. Not surprisingly, the modern day analogs of yellow journalism do everything they can to discredit these authoritative metric organizations. The next wave of contemporary journalism will almost certainly be a battle between these two forces, just as these are played out in other areas such as political polling. Most campaigns maintain two polling campaigns, one to provide (or in some cases create) sentiment data that shows their candidate doing well, the other working with the internal ‘books’ that show how the campaign is really doing.

The hottest technology in the fintech space, blockchain, deserves a mention here as well. Blockchain is intriguing because while it is usually seen as a measure of financial transactions, blockchains also provide mechanisms for identifying provenance information — who makes a claim, when was the claim made, what was the basis of that claim and so forth. Because blockchains are distributed, they can in effect record when changes are made into a set of transactions and can be used to ascertain when spurious data is introduced.

Again this does not guarantee that the transactions themselves — the assertion of certain acts — are true, only indicates when and where the assertions were made. However, even that is a huge step forward, especially if some measure of trustworthiness can also be made about the transaction. People do not like to lie when such a lie can be recorded, because it makes it much harder for them to deny they made the lie in the first place. There are very real legal and financial penalties for those lies, including fines and incarceration.

Semantics, on the other hand, makes it possible to make assertions about assertions. If blockchain is the ledge, semantic data provides the contextual metadata for those assertions, in effect creating tools to link the assertions to other assertions about the things being recorded.

This has significant impacts upon business as a consequence. As we enter farther into the reputation economy, a reputation for honesty and factual reliability will become a distinct competitive advantage, especially in an increasingly virtualized world. Indeed, in many respects, such an authoritative score will very likely factor into everything from company valuations and stock prices to available to customer bases.

Already, an informal word of mouth system can have a major impact upon the viability of companies (cf. the recent scandal that engulfed United for forceably ejecting a paying customer because of a logistics error, costing the company billions of dollars in market valuation). Expect as provenance becomes more readily determinable that it will play a major part in the reputation economy.

Provenance is crucial. I hope, in a subsequent article, to look at the mechanics and modeling of provenance, but the idea behind it is already becoming a major factor in the way that business is done in the twenty first century.

Kurt Cagle is the editor of The Cagle Report on Linked In and the Metaphorical Web.

Provenance Is Not A Region In France

Written by Kurt Cagle