Hitchhiker’s Guide to Analytics — Truth

The truth, the whole truth, and the Single Source of Truth

Greg Anderson
Creative Analytics
Published in
6 min readJun 20, 2017

--

I wish Prak were around.

I mean, not really. He was pretty annoying most of the time, especially if you happened to mention frogs. Or Arthur Dent.

Prak is (or was, or will be, or wiollen haven been) the source of truth. Terribly useful chap to have around, when you can get to him.

We’ll talk about Prak next time.

Truth

If you’ve been following my recent articles, you’ve seen me circling the idea of just how far data can be trusted. I’ve been edging constantly inward, trying to find a useful thesis on how data reliability is defined or determined.

If you haven’t been following my work, what are you doing with your life?

You’ve been arguing with the Nutri-Matic again, haven’t you?

It started when I drew what I thought were noteworthy parallels between the pervasiveness of data in our society and the presence of the Force to the Jedi.

After I got back from that galaxy far, far away, I plugged back into the Matrix, digging through its code to find the bottom of the proverbial rabbit hole.

I took the question “Why do we trust the data?” and ran with it.

It seemed like an interesting line of thinking to pursue. There’s certainly been a lot of thought dedicated to it and no shortage of opinions about it.

As it so happens, the concept of data reliability isn’t as well-defined as you might think. It’s not really a line of thinking so much as a neo-Gordian knot woven from nihilism and carbon nanotube Moebius strips.

Your earbud cables and power cords can’t scare me now

In a higher dimension of which we know nothing, the mighty king Alexander bellows with rage, but that’s not important now. He’s just bitter because he knows that cutting the original knot was a punk move.

Single Source of Truth

If you want to claim that your analytic solution or report is correct, you will need one of two things: a trusted data source for comparison or a complete and utter lack of any such thing.

When multiple data sources exist (and they will), you will need to address variances between them. Your phone displays a different time than your computer. Your license says that you’re 25 pounds lighter than that lying scale at the gym. Your bank statement has a different balance than your checkbook.

Bigger data sets mean bigger variances. The POS terminal software says you made $2.13 million in floor sales last quarter, but the nightly receipt batches add up to $2.09 million.

When you have multiple systems, one of them will usually be declared the single source of truth (SSOT). Which one? Usually the most annoying one. But after some amount of discussion and argument and negotiation, one data source among many will be declared to be ‘correct’.

You might also hear about multiple versions of truth (MVOT), a newer entry in the buzzword lexicon. It acknowledges the simple fact that separate systems, tracking the same data, using the same logic, can report different results.

But what if there is no single source of truth?

Work quickly. Do your job properly, document your logic, and get it published.

If no one can identify a single source of truth, create it.

Just make sure you know what you’re doing. And be ready to justify it.

Back to the Guide

The owners of Megadodo Publishing faced a similar conundrum with the Hitchhiker’s Guide to the Galaxy. Even putting aside the constant complaints from the publishers of the Encyclopedia Galactica, the universe is a big place. Things change. People change. Sometimes, things change into people.

Guide reporters are also notoriously lazy, but they’re nothing compared to the editors. Updates are sent through without review, forgotten entirely, or made up on the spot. They have deadlines to keep, after all.

So they looked at the two options and selected option C. They ignored, disregarded, and discredited the competition

The Hitchhiker’s Guide to the Galaxy is an indispensable companion to all those who are keen to make sense of life in an infinitely complex and confusing Universe, for though it cannot hope to be useful or informative on all matters, it does at least make the reassuring claim that, where it is inaccurate, it is at least definitively inaccurate.

In cases of major discrepancy, it’s always reality that’s got it wrong.

They hung a sign in the lobby and everything.

This has led to some interesting consequences.

For instance, when the editors of the Guide were sued by the families of those who had died as a result of taking the entry on the planet Tralal literally,

(it said, “Ravenous Bugblatter Beasts often make a very good meal for visiting tourists: instead of “Ravenous Bugblatter Beasts often make a very good meal of visiting tourists”)

They claimed that the first version of the sentence was the more aesthetically pleasing, summoned a qualified poet to testify under oath that beauty was truth, truth beauty, and hoped thereby to prove that the guilty party in this case was Life itself for failing to be either beautiful or true.

The judges concurred, and in a moving speech held that Life itself was in contempt of court and duly confiscated it from all those there present before going off to enjoy a pleasant evening’s ultragolf.

Case closed, as it were. If you think that sounds like an outrageous use of the SSOT paradigm, you will one day join those of us who only wish it were.

So… what?

Despite the obvious drawbacks and concerns of bias, you always want to have a clearly defined single source of truth (SSOT).

Yes, you do.

For an analytic solution to have any meaning or carry any weight, you need a defined SSOT if you want to audit or validate any of your work.

It’s not going to be 100% accurate. It’s not even going to be 100% complete. It might not be the best of the available options.

I will tell you one thing that many project managers will deny to their dying breath, especially in finance and healthcare analytics. Quite often, the determination will be entirely arbitrary . When it’s not, it is decision made by necessity, by default, or to satisfy some personal agenda.

Would it save you a lot of time if I just gave up and went mad now?

If you can affect the decision, then try to make it a good one. Don’t base your determination on the factors I just told you everybody uses. Be better.

The SSOT is necessary because it gives you a standard for comparison. Without that, you can create the smartest analytic solution in all of time and space, and you’re still asking the stakeholder to accept your results on faith.

Don’t ask for their faith. Earn their trust.

--

--

Greg Anderson
Creative Analytics

Founder of Alias Analytics. New perspectives on Analytics and Business Intelligence.