Lies, Damned Lies & Beautiful Statistics
There’s a bit of a worrying trend developing around the world, and that is for people to dismiss data and statistics unless the information is “100% accurate”. Data and the resulting statistics are too easily cast aside when we don’t trust the source or its interpretation.
This has given rise to the following quote, circa 1900:
“There are three kinds of lies: lies, damned lies and statistics”
- Benjamin Disraili, Former British Prime Minister (quote popularised by Mark Twain, Author)
That statement has merit, but it’s not the whole story.
I’m not sure if we can ever get to perfect data.
Much of reading data is about interpreting it, and interpreting it is subject to all the bias, pre-conceived ideas and personal experiences that come with being human.
What I am sure of is that having data, even imperfect data, is better than than not having any data.
Looking to data for guidance is better than relying on rhetoric, feelings or opinions — many will agree that these last three aren’t making the world a better place at the moment.
With that said, the two most common arguments I’ve seen to dismiss data are:
- The data is imperfect or;
- Data can be used to purposefully mislead.
Let’s look a bit closer at each of these:
1. The data is imperfect
In most practical applications, the data you gather will not be 100% accurate. Aiming for 100% accuracy is often prohibitively expensive.
For example, if you are collecting data from thousands of manufacturing site across the globe, then you will likely run into issues such as:
- All sites don’t respond timeously
- Language barriers makes interpreting requirements and results difficult
- Different understandings of what data to include
- There will probably be local circumstances that the data-gatherer and analyst is not aware of
And there could be a million other things wrong with it! There is no upper limit to how much you might spend in order to make sure that the data is 100% accurate.
This is where the concept of Materiality comes into play. If you decide that including or excluding the data will not “materially” affect the outcome, then you are making a judgement on the materiality of the information. In short, materiality refers to whether the outcome is significantly affected by the change in information or not. Determining materiality is a case of exercising your professional judgement. In some industries, there may be standards to help professionals determine materiality.
Another way to consider materiality is asking, is the data “good enough” on which to base a decision? Don’t mistake this question as a justification to accept poor data, but rather as a prompt to determine whether a slightly different dataset will change the outcome or not. Usually it won’t.
Missing or inaccurate data is even less of a problem for decision-making when considering trends. General trends rising or falling will hardly ever be affected by individual points of poor data.
*Cough cough* Climate change deniers *Cough cough*
You will find that imperfect data is usually not a problem. If you understand what the issue is and how it might affect the result, then 9 times out of 10 you will still be able to confidently make a decision based on what you have available.
Your professional judgement in determining materiality is more valuable than being able to interpret a perfect data set!
These leads us to the another common argument for dismissing data.
2. The data is misleading
Or to put it more accurately, the interpretation of data can be misleading. Data and statistics have no doubt been abused to purposefully mislead people since probably 1 minute after the very first statistic was ever calculated.
This is where bad or misleading data is still better than rhetoric, feelings or opinions.
Opinions and feelings can’t be scrutinised. Data can be.
Any set of statistics that are worth considering can be scrutinised. If the data says crime is down, you can scrutinise it. When you’re told that immigrants are costing the country millions, scrutinise the data. You are free to make your own judgement — both on the data and the original interpreter!
Long live data!
I hope you are one step closer to appreciating how important data and statistics are and how useful they can be, even when they are imperfect.
This post originally appeared on my blog. Read more here.