A Philosophical approach to Data Intelligence

Jonas Scherer
6 min readJan 10, 2018

--

To simply put, this article is intended to make you think more about your data. As we go, you’ll realise that it’s always good to hold your conclusions a little and dig deeper into the data in order to find more valuable information. Whether you are a researcher, manager, analyst or scientist, the purpose of this article is to make you think more and I sincerely hope it helps you reach better conclusions.

Based on this article posted on Aeon’s website, Alan Hájeks, professor of philosophy at Australian National University in Canberra, proposes a tool kit to think more like a Philosopher. I have written this article in order to propose a way to analyze and extract more value from the data analised. The tools below are a compilation from Alan Hájeks’ article and all that I have done is apply all the concepts on a data analysis. I have also provided some fake statements that will let us better question everything.

Introduction

Let’s think about data and what this data could bring us. We usually use data sets to provide a summary of things to report. I’m going to put this on the perspective of an e-commerce.
What does it sell? It doesn’t matter.
We’re thinking only about the data here, okey?

So, we have a website where we track users actions, like bought and viewed items, so we have lots of information about those users: like their age, gender, address, etc… So the main purpose of all this data gathering generally is to provide us with some insight, right?

Let’s say we have 3 years of historical data to analyze and we need to report something or even question some conclusions about this very own data.
What analysts usually do:

  • Provide a sum of some aspect, like: total bought items, total amount of sells per month, etc;
  • Classification and ordering: top buyers, top product categories, etc.

This, is not data intelligence. We are only grouping things, classifying in order to provide a bad report to a manager, CEO, CFO, or other high level employee of a company.

In order to have some intelligence added to it, we must actually think more about the data.

Let’s begin.
First of all, we go trough some concepts:

#Concept 1 — Confirmation Bias

The tendency to look for and to recall evidence that confirms - and does not go against - one’s beliefs and hypotheses.

#Concept 2 — Congruence Bias

The tendency to accept a belief or hypothesis without adequately testing alternative hypotheses.

All tools bellow are provided to avoid the concepts above. As data scientists, analysts and managers, we need to look for something that disconfirms and constantly tests alternative hypotheses on what we are concluding. That being said, some tools might be quite silly. But, sure, some of them might help us in some situations.

So do not be afraid to question both your data and conclusions, no matter how silly these questions might be. The point here is to provide something that tries to disconfirms our conclusions. Here are our tools:

#Tool 1 — Multi-existense or Non-existense

From “a definite description — typically comes with an assumption that there is exactly one X. We might be able to challenge that assumption, in two ways: perhaps there is more than one X; perhaps there are no Xs.”.

Let’s take a look at the statement bellow from our sample website data:

“The data from our website shows highest price buyers from category ‘shirts’ are male 40+year-old”

With our Multi-Existense tool we could simply ask:

  • Are there highest buyers with 40+ years in another category?
  • How many categories we have with highest buyers with 40+ years?

With our non-Existense tool:

  • Is the data set valuable enough? The conclusion above could be brought from a 100 records only, so the data set is not big enough to come up with conclusions;
  • Are there many male buyers with 40+ year-old? From a perspective that we have historical data from 3 years, we could conclude that this group only represents 0.5% of our revenue among those years. So this group is not relevant to us and their existence on our report is useless.

These are really good tools to avoid and contest specific measurements and let us expand our vision about the data.

#Tool 2 — Contrast Class

Alan Hájeks says that “when evaluating some claims, mentally highlight each key term and run through its contrast class, the set of relevant alternatives”.

Here is an example of application:

“25 year-old women are good buyers”

Here I am going to highlight all key terms and their particular “as opposed to” questions:

  • 25 year-old women are good buyers, as opposed to man?
  • 25 year-old women are good buyers, as opposed to those who are less then 25 year-old?
  • 25 year-old women are good buyers, as opposed to bad buyers?

This could lead us to even more complex aggregated questions, such as:

  • Are there good buyers of the male gender?
  • Are there good buyers who are more than 25 years?
  • How many men pay as much as those women? What are their characteristics?
  • How many people who are more then 25 year-old pay as much as those good buyers? What are their characteristics?

#Tool 3 — Causation relation and Similar reasoning

From Alan Hájeks’ tool kit: “causation seems to be a two-place relation: smoking a pack of cigarettes a day causes lung cancer — so far, so good. But consider: smoking a pack of cigarettes a day, as opposed to three or four, causes lung cancer? That doesn’t sound so good. If anything, relative to those alternatives, smoking (only) one pack a day seems to help prevent lung cancer. So it seems that causation is at least a three-place relation: C causes E relative to C’. Similar reasoning suggests that it is even four-place: C rather than C’ causes E rather than E’.”

Here is some statements provided by our data:

“The data from our website shows that 80% highest average price buyers are 40+ year-old males ”

Apply your tool and think a bit more about this data:

  • Causation relation: Male users with 40+years tend to buy more expensive items — Is it true? How big is this group? What is their conversion rate?
  • Reasoning relation: Male users with 40+years, as opposed to Women, with less then 40 years, tend to buy more expensive items — Is it true? Do women in this category tend to buy less expensive itens?

#Tool 4 — Extreme check

From Alan Hájeks’ tool kit: “You might be facing a huge search space. Where should you look first? Here’s an easier sub-problem: check extreme cases to see whether any counterexamples lurk there — the first case, or the last, or the biggest, or the smallest, or the smelliest, or any similar superlative (always being aware of the definite descriptions!) Does the claim still hold there? This should drastically reduce your search space, as it now just involves the ‘corners’ or ‘edges’ of the original space.”

Lets change the statement a bit to question our data:

“Which profile is our best buyer?”

Now the extreme checking:

  • Who is the biggest buyer?
  • Who is the smallest buyer?
  • Who are the top 10 buyers for each gender?
  • How much did the oldest/youngest person spent?

#Tool 4 — reductio ad absurdum reasoning

From Alan Hájeks’ tool kit: “suppose that the claim is false, and show that this leads to a contradiction. This provides a proof of the claim, one in which the claim is established conclusively by that reasoning. There’s an instance of reductio ad absurdum reasoning for you. Gaunilo then parodies it: consider the concept of the perfect island. A greater island cannot be conceived. Now, suppose that this island does not exist. Then a greater island could be conceived — namely, one with the island’s greatness and that exists. Contradiction. So the island exists. But this is absurd. So we should reject the ontological argument, which employs parallel reasoning — it ‘proves too much’. ‘Proves too much’ reasoning is a form of analogical reasoning. ”

Schematically:

  • Entity X has properties F, G, H, …
  • Entity Y also has properties F, G, H, and also I.
  • Therefore, (plausibly) entity X also has property I.

Using this we could look for some statements about our data:

“Women on our site who are in their 30’s that have already bought products from category X, Y, Z and also buy the product N”

And try to reductio ad absurdum it:

“Man (plausibly) on our site who are in their 30’s and already bought products from category X, Y, Z also (plausibly) buy the product N”

Nice, huh?

Conclusion

To have data intelligence we need to think more about the data and not just sum things up, order or group them together. I hope this helps you one of these days.

Let me know your thoughts!

--

--