Mingus the dog knows how to use data better than a lot of people I know.
Let me explain.
Every time I enter the bathroom near our front entry, our dog Mingus comes crashing down the stairs.
You might call this the Powder Room* Heuristic. For me, it means that if I close that bathroom door, the dog runs downstairs. For Mingus, it means that hearing the door close signals that it’s worthwhile for him to get down there.
Mingus knows that my wife and I usually take the prudent step of using the facilities before going out. And “going out” often means that he gets to come, too.
Possibly, he just wants to get ready. Possibly, he thinks that his adorable presence will persuade us to take him with us. There is no way for us to know his motivation for sure.
Conversely, he doesn’t know why we closed the door. All he knows is that the closing of the door often leads to his going for a walk.
“Ah, Mingus,” you might say. “Correlation is not causation!”
Of course not. Whenever possible, you want to research the cause. If he could, Mingus — napping in his bed — might simply call down, “Hey, Dave, are you going out? Can I come?”
But he can’t. He has to work with inadequate data.
For the purposes of data science (or of science in general), the question is: what can we observe that leads to a given, reproducible result? If we can measure this, we can predict future outcomes. Since Galileo and Newton, you can tell me, precisely, how long it will take an object to fall to the ground from, say, the Tower of Pisa. If you know its mass, you can tell me the force with which it will hit the ground.
The question of why something happens is fundamentally irrelevant. That’s why we call it the Theory of Gravity. We know how it works, but we have no idea why. But the correlation is strong enough that we can reliably take action without knowing why gravity exists.
It wasn’t always like this. From before Aristotle until the Scientific Revolution, humanity worried about causality, about the motivation that drove things to do things. And scientific progress was very, very slow.
Today, scientists no longer focus on why things happen, but rather on what is happening.
Mingus the Dog gets this. By bounding down the stairs at the sound of the closing powder room door, he maximizes his chance of realizing one of his favourite things — going out. And, if he’s wrong, running down the stairs is fun, anyway! (That, however, is just a fun observation that is beside my point, though I suspect it does get to the question of Mingus’s motivation.)
As an example, let’s take a great data analysis problem of our time: climate change. There are three elements here that interest me, and they help explain why my dog is a better data scientist than a lot of people.
First, the case for anthropomorphic climate change remains circumstantial. That doesn’t make me doubt it, because the likelihood of it being coincidental is infinitesimal. Scientists have enough comparative data to demonstrate that the recent rate of temperature change is convincingly anomalous, and that human activity (specifically, burning of fossil fuels) is the obvious changed variable.
Second, and more important for the point I’m trying to make, the phenomenon is what it is. Even if there is a tiny chance that something besides human activity is causing the problem, so what? It makes total sense to behave as if the highly correlated hypothesis is correct and take action in the hope that we can avoid worsening the situation. (Of course, most economic models show economic growth from green energy outpacing the costs of eliminating carbon, so it’s still a good idea — just as Mingus enjoys running down the stairs even if he doesn’t get to go out.)
Third, no one should care why the earth is reacting this way. Is it sentient? Is it feeling disrespected and thus motivated to raise its seas? Honestly, all that matters is what is happening, not cosmic speculation about what motivations might be behind it.
Strong correlation makes it worthwhile to test the hypothesis, and usually to take action. This is especially the case with health issues. No one has ever proven, for example, that smoking causes cancer. But the correlation is overwhelming.
It seems to me that anyone who uses data to make decisions — which is all of us — needs to keep a few ideas in mind, whether designing a user experience or making a life decision. Each of these ideas is a topic for another post (if not for a book), but briefly…
- Work with the data you have. Get as much as you can, of as many kinds as possible. In designing a UX, you want to understand people’s attitudes and comprehension as well as their behaviour. But if all you have is behaviour (things like usage data or purchase history) then you have to be like Mingus and go with it. This is one of the great lessons of machine learning, which is at its most powerful when all we have is raw behaviour data, and we cannot ask about attitudes or comprehension.
- Analyze carefully. What can you discern from the data? Don’t tell yourself elaborate stories, filling in the blanks with made-up details. Behavioural economists call this the conjunction fallacy: the more details you add, the better it sounds — and the less likely it is to be true. Storytelling is great for convincing people, and also for misleading. At the same time, you have to be careful not to jump to conclusions based on recent experience (the availability heuristic). It’s about clear, if sometimes unexpected, patterns.
- Take action. Once you have a good hypothesis, act on it. What distinguishes people who make an impact from the rest of us is that they take action. Too many times, we waste time and effort worrying about why things are happening, and we hesitate. As long as we can be confident that there won’t be a negative unintended outcome (first, do no harm), it’s time to get going.
It’s about increasing the likelihood of a good outcome.
A friend of mine is visiting, and he just went into the downstairs bathroom and closed the door.
Guess who then ran down the stairs?
“Okay, Mingus. Let’s go for a walk.”
* For you civilized people outside of North America, a powder room is a half-bath, a small room with a toilet and a sink.