Data can be deceptive

Most of us come across a lot of data points on a daily basis. Our brain possesses the capability to interpret simple data and make sense of it. We realize how much a product costs and how much we value it to make a good purchase decision. A student can figure out how many marks he needs to pass an examination. A salesman can figure out how many more units he needs to sell to achieve it’s monthly targets. For complex calculations that our brains cannot compute we have designed systems to do so. But sometimes we fail to understand the data that is presented to us or we make a mistake in interpreting the same. Sometimes there is more to data than what meets the eye. I would like to take a very simple example to explain this to you.

This is a story of Abraham Wald and World War II. Abraham was a mathematician with a lot more data analysis abilities than the most of us. He was working with the statistical research group of America during World War II. The Statistical Research Group (SRG),was a classified program that yoked the assembled might of American statisticians to the war effort — something like the Manhattan Project, except the weapons being developed were equations, not explosives. So here’s the question. You don’t want your planes to get shot down by enemy fighters, so you armor them. But the armor makes the planes heavier, and heavier planes are less maneuverable and use more fuel. Armoring the planes too much is a problem; armoring the planes too little is a problem. One needs to find the correct balance between them . This is where the role of mathematicians comes into play. They are supposed to determine what the optimum number/value is.

The military came to the SRG with some data they thought might be useful. When American planes came back from engagements over Europe, they were covered in bullet holes. But the damage wasn’t uniformly distributed across the aircraft. There were more bullet holes in the fuselage, not so many in the engines.

Bullet hole Data for Planes returning from the War

The officers saw an opportunity for efficiency; you can get the same protection with less armor if you concentrate the armor on the places with the greatest need, where the planes are getting hit the most. So how to determine how much more Armour needs to a be put on specific parts of the plane.

Wald gave a straight answer to the problem. He said the Armour doesn’t go where the bullet holes are. It goes where the bullet holes aren’t: on the engines.

This sounds quite counter intuitive to to a lot of us because the numbers show that the bullet holes are concentrated more on Fuselage than the rest of the plane which means that is where the planes are being hit more. But, there is a big flaw with this line of thinking. Let’s take a step back and look at the problem again.

We can fairly assume that while the planes are being hit by bullets, the probability of hitting any section (considering almost all sections are of same visible size ) is almost the same. That’s makes us think, where are the missing holes on the Engine.

Wald was pretty sure he knew. The missing bullet holes were on the missing planes. The reason that the planes were coming back with fewer hits to the engine is that the planes that got hit in the engine weren’t coming back. Whereas the large number of planes returning to the base with a thoroughly Swiss-cheesed fuselage is pretty strong evidence that hits to the fuselage can (and therefore should) be tolerated. If you go to the recovery room at the hospital, you’ll see a lot more people with bullet holes in their legs than people with bullet holes in their chests. But that’s not because people don’t get shot in the chest; it’s because the people who get shot in the chest don’t recover.

We often interpret half the data that is available to us and doing so can give us really incorrect results. We have to look at a full spectrum of data to really understand what works and what doesn’t. We cannot simply ignore a piece of information and come to a conclusion on the basis of the data that we have.

Another important mistake we tend to overlook is making inconsiderate assumptions. A lot of us would have assumed that the likelihood of a plane surviving a hit is uniform irrespective of the place where it was hit. Sometimes we make these assumptions sub-consciously. This assumptions leads us to believe that we just need to look where the number of holes are greater and protect those parts better. But this logic fails because our underlying assumption was wrong and the planes will have different likelihoods of survival on being hit at different parts.

A simple check we can all do to avoid making incorrect interpretations of data we come across everyday is to ask overselves these two important questions.

  1. Is the data complete or is it representative of the complete data ?
  2. Am I making the right and accurate assumptions ?

Asking these two questions can turn out to be very productive and help you see clearly between right and wrong.

The example of Abraham Wald and the Case of Missing Holes was picked up from Jordan Ellenberg’s book “ How to not be wrong : The power of mathematics”