Statistics : The Art of Deception

Harshit Mangwani
The Startup
Published in
8 min readDec 8, 2019

Our decision making is influenced by the numbers that surround us. We come across claims like 80% of the professionals recommend our product, 4 out of 5 men choose our cream for skincare, followed by a statistical presentation. These include graphs, pie charts or histograms. Do we ever bother to scrutinize them? What are the odds that we will test their veracity every time they are presented?

More often than not, the pictures and numbers win. Visualizations can get into a dangerous area. The unconscious brain interprets the picture first and then the message gets laid on top of it, whatever it may be. However impractical it may sound. But who can argue with cold hard numbers?

There are many facets to the presentation of statistics, making it easier to lie with them and get away easily. Let’s have a look at some of them.

  1. Preattentive Visual Attributes
Properties That Stand Out in a Visual Representation

Visualizations :- Brilliant ability to spot patterns, outliers and colors, sizes and shapes quickly and unconsciously using these things called preattentive visual properties. A preattentive visual property is processed in spatial memory without our conscious action. It takes less than half a second for the eye and the brain to process a preattentive property of any image. We do it at the unconscious level. It is a great asset when searching for patterns in data. But it is also open to misuse. These properties can be exploited to make it easier for a user to understand what is presented through the design and save them from consciously processing all the data presented in short-term memory which requires more effort.

2. Choice of 3-D Pie Charts

Consider the two 3-dimensional pie charts -

The only difference between the two charts is that they are rotated 130 degrees relative to one another. One chart is spun to change what shows up in front and in back, which tends to skew what the human eye perceives as larger. The reader is required to measure an area and to do so with the pie slices occupying disproportionate real estate in the chart thanks to the 3D. That’s not a natural thing to do.

If your goal is to make something look larger, use a 3D pie chart and highlight it in front. If you want your audience to understand the data, find an alternative.

3. Distortion in Scale

Both the graphs above represent the same data, except that they have different starting points on the vertical axis. The minor differences between values can appear to be much more significant than what they really are. Effectively, the presenter has zoomed in on the region they are interested in. If the graph were appropriately shown, the difference in sizes of bars would be proportional to the value being represented.

Graphs can be doctored by altering the horizontal axis too. The following curve shows increase in unemployment in US after the Great Recession in 2007.

The unemployment curve above has time intervals equally spaced for display. A closer look at their values suggests that this is not the case. The second space is equivalent to 6 months, while the third space represents 15 months. This kind of manipulation eclipses the important data/event that may have occurred between the longer period of 15 months. In effect, the graph shows unemployment almost linearly increasing with time. With more consistent data plotting, the representation looks like :

It is obvious that the unemployment increased immediately after the recession, steadied down after some time before it eventually starting falling.

4. Use of Cumulative Data Set

A proper example of deception using statistics is by putting cumulative values for display. Many people opt to create cumulative graphs of things like the number of users, revenue, downloads, or other important metrics. For example, instead of showing a graph of annual revenue, one might choose to display a running total of revenue earned to date. Let’s see how this might look :

The curve heading upward shows that the firm is doing alright. However, the decreasing slope ( suggested by the concave down nature of the graph ) clarifies that the annual revenue is decreasing. The actual data points give rise to the following trend.

Now things are a lot clearer. Revenues have been declining for the past ten years! If we scrutinize the cumulative graph, it’s possible to tell that the slope is decreasing as time goes on, indicating shrinking revenue. However, it’s not immediately obvious, and the graph is incredibly misleading……because cumulative data always shows an upward trend.

5. Correlation and Causation

Correlation tells us how strongly two variables are linearly related and change together. It does not tell us why and how behind the relationship but it just says the relationship exists. Correlation is something which we think of when we can’t see under the covers. So, the less the information we have, the more we are forced to observe correlations. Similarly, the more information we have, the more transparent things will become and the more we will be able to see the actual casual relationships.

Causation takes a step further than correlation. It says any change in the value of one variable will cause a change in the value of another variable, which means one variable makes the other to happen. It is also referred to as cause and effect.

In the majority of the cases, correlations are just because of the coincidences. Just because it seems like one factor is influencing the other, it doesn’t mean that it actually does.

For example, ice-cream sales do not cause an increase in cases of heat strokes, nor the other way around. Still, they are correlated and both increase during summers, which is a third variable affecting the first two. But the headline saying something like :

“Recent studies have found ice-creams are responsible for increased cases of heat strokes ”

can be justified using this correlation while abstracting the angle of causation.

6. Colors

One of the more popular features of mapping software is the ability to create heat maps, where different colors are used to distinguish between individual values. How these colors are arranged on a map can have a direct impact on how an audience interprets the values.

For instance, using an abrupt contrast in colors, such as going from a dark shade of blue to a light yellow, can make a viewer believe there is a more drastic change in the values than it really is. Conversely, a map that displays little color contrast can give the impression that there is very little difference between the mapped values, when in fact the reality might just be the exact opposite.

7. The Framing Effect

Framing Effect is the principle that our choices are influenced by the way they are framed through different words, settings, and situations. Framing bias occurs when people make a decision based on the way the information is presented, as opposed to just on the facts themselves. The same facts presented in two different ways can lead to people making different judgments or decisions.

Which one of these products would you pick: A ‘95% effective’ condom or a ‘5% failure’ condom? 80% fat free yogurt or yogurt with 20% fat? Most people would be more likely to choose the first option in both cases, even though the two choices are identical. The standard economic model predicts that people will always make the same choice if given the same outcomes, by maximizing expected utility. Different wordings, settings, and situations will have a powerful effect on decision-makers.

8. Percentage and Percentage Points

Let’s say the high school dropout rates in a certain country increase from 5% to 10%, is that a 5% increase or a 100% increase? If you are making 5$ an hour and you get a 100% raise, you will be at 10$ an hour now. So, which one is it? Which one portrays a clearer picture. Did the number of dropouts increase by 5% or by 100% ?

The analysis becomes much easier if the size of the sample is known. If the dropout rate is 1 in 1000000 and it increases to 2 in a million, the conclusion that dropout rates have doubled or increased by 100% is an overstatement. Such a statement makes it seem like a worse problem than it is. In fact, it saw an increase from 0.0001% to 0.0002%. Here, the concept of ‘percentage points’ instead of percentage makes a lot more sense. One can just state the result as an increase of 0.0001 percentage points.

All right, what about the prevention and cure? It all comes down to the thinking process which are slow and require a bit of effort. There are some very simple measures to the rescue, which are derived from the very points discussed above:

  1. Always have an idea that some data might be missing.
  2. The presence of hidden variable affecting two correlated quantities should always be kept in mind. Correlation is not the same as causation.
  3. The credibility of the data source is of utmost importance.
  4. Watch out if the observation supported by statistics makes sense or is too good to be true.

--

--