The trap of being driven solely by data

Published in

daphni chronicles

3 min readJun 16, 2017

There is not a single day where I don’t heard things such as “data doesn't lie, people do”. This ideology about the neutrality of data is everywhere around us in the tech world.

However, we tend to forget that the use of data is far from preventing subjectivity and ensuring truth. There are huge biases hidden at both the capturing and the comprehension stages.

One of the issue is well illustrated by the story of the drunk under a streetlamp (nicknamed the “streetlight effect”):

A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them there, and the drunk replies, no, and told him that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, "this is where the light is".

We tend to select the data that are easy to collect and draw inferences from data that we already have.

And even when we try to design the right data-driven approach, with some randomized experience and pairing indicators that react to effect and potential counter-effect (eg: productivity and quality or satisfaction), we are always subject to confuse correlation with causation.

We underestimate the role of randomness, and forget or misinterpret some variables at the core of the phenomenon we see. Let me take two quick examples: (1) A startup is launching in several countries and starts to have a tremendous growth of revenue. So we all think that the golden formula is to launch in several countries. But the true causal parameters might be the quality of their product and the launching of a new regulatory framework (which very often appears in google analytics). (2) A startup delivers a new feature and notices a huge growth in traffic. They think they’ve cracked it. But maybe the activity is just cyclical and the increase of growth and feature shipment is mere luck.

Let’s not go into too much details. My goal is not to highlight all the traps of data analysis (it would take too long and I’m not qualified).

Data sciences are not called sciences for nothing. And they rely on many interrelated disciplines. We shouldn’t throw in the towel and just rely on intuition, but we should keep in mind that if people lie, so does data. As Darrel Huff puts in his book How to Lie With Statistics: "if you torture the data long enough, it will confess to anything."

And we should beware when we select metrics, because they will have a huge effect on the real world. Especially if people are aware of what’s measured, since they create incentive structures.

I’m a huge fan of what Daniel Yankelovich said in Corporate Priorities: A continuing study of the new demands on business. His four steps are a very well summary of what we should have in mind:

The first step is to measure whatever can be easily measured. This is OK as far as it goes.
The second step is to disregard that which can't be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading.
The third step is to presume that what can't be measured easily really isn't important. This is blindness.
The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.

The trap of being driven solely by data

Written by Willy Braun