Making the World Better with Data (in 4 Steps)
A Solutionist’s Playbook
“Data” has, for some time now, been quite a buzzword. Governments talk about how they want to be “Data-Centric”. Corporates talk about being “Data-Driven”. In both public and private organizations, “Chief Data Officer” or equivalent designations are, at the very least, loudly discussed.
Sadly, while “Data” has featured in much talk, for most people, it has underdelivered. We have all this data, but the problems data promised to solve haven’t gone away; and in many cases, the problems have multiplied.
Hence, it’s not unreasonable to throw one’s arms in the air, be honest, declare that data doesn’t work, and walk away. I, however, disagree. Not because I’m a “data optimist” who believes we need to continue to “believe” in data, no matter what. But instead, because I see why data has underperformed and see a realistic path to make data deliver.
Step 1: Data-Driven Definition
All pieces of “data” are descriptions of the real world. Hence, one can use them to describe and define problems.
For example, we can define an individual being “underweight” as one whose Body-Mass-Index (BMI) is below some threshold, say 18.5.
While words like “underweight” are vague and subjective, data helps give them an objective meaning. This objectivity helps communication and agreement. While your “underweight” might not be my “underweight”, your 18.5 BMI is a lot closer to my 18.5 BMI. If I tell you, “don’t drive too fast”, you might not know what that means? Is 60 KMPH too fast? 100 KMPH? However, if I tell you, “Don’t exceed 80KMPH”, you know exactly what I mean.
Hence, the first step to solve a problem with data is to define it in terms of data.
Step 2: Data-Driven Measurement
A data-driven definition of a problem makes the problem objectively recognizable. The next step is to recognize concrete instances of the problem.
For example, we could survey Sri Lanka to understand where people with BMIs below 18.5 live. Following such a survey, we might conclude that 11% of Sri Lankan adults and 37% of children are underweight.
Sadly, many self-described data-driven organizations stop with Step 2. Definition and measurement are relatively easy and highly lucrative. Many “research” organizations specialize in this, conveniently stopping short of solving the problem; charlatans selling expensive speedometers for cars without engines.
Step 3: Data-Driven Ownership
Step 3 is not exactly about starting the engine but pledging to start the engine and go some distance. Once a problem has been measured and defined, anyone wishing to solve it must take ownership of at least part of the problem.
For example, if you want to “solve part of the underweight problem”, you might pledge, say, “reduce underweight-ness in children from 37% to 30% in the next five years” or pledge the same goal in some specific part of Sri Lanka (say the Uva Province).
Now, you might have heard plenty of people (often politicians) going around saying things like this. However, the “pledge” is only the first part of Data-Driven Ownership. The second and more important part is “Skin in the game”. Not only should one pledge to solve part of the problem, but must also pledge a significant amount of your own resources and reputation to solving it.
The “your own” part is very important. A research organization or a politician who pledges the countries (i.e. not their own) resources has no “Skin in the game”.
Step 4: Data-Driven Iteration
Once one has taken ownership of a problem or part of it, the next step is to solve it.
All big problems have small solutions. Obviously, not one small solution, but a large collection of small solutions that meld together to solve the bigger problem. For example, solving Sri Lankas underweightness problem involves solving the problem for each underweight person and, perhaps, for each meal of each underweight person.
Small problems have at least two benefits. Firstly, the cost of failing is disproportionately smaller for small problems. Secondly, it is easier to apply data to solve small problems; whatever the data says is more “Statistically Significant”.
If anyone tells you that big problems need big solutions, they are either politicians or classical economists. Neither has any skin-in-the-game and hence won’t care about failing. Nor does either understand Statistical Significance.
Invariably, the more successful problem solver is not the smarter problem solver but the faster problem solver. The faster one can iterate through a large number of small candidate solutions, the more likely you are to solve the big problem.
This final step is the most important of Data-Driven problem-solving. The problem-solving process is not one monolith but an iterative process.
Concluding Questions
To summarize, if anyone tells you that they are “Data-Driven”, you should ask them the following questions:
- What is the problem you are trying to solve, and what is its data-driven definition?
- What is the extent of the problem, measured in data-driven terms?
- What subset of the problem are you solving? And what is your own skin-in-the-game? (described in data-driven terms, obviously)
- What is the data-driven iterative process that you’ve set up to solve the problem?