Replica of a graph displayed on a national news network

The Ethics of Data Visualization

Peter Haferl
5 min readJul 29, 2019

--

As the modern economy develops and more and more data is being produced, data is being used to inform decisions ranging from business and marketing to politics and policy. With the burgeoning relevance of big data in nearly every person’s life, the burden is heavy on a data scientist’s shoulders to analyze and portray the data in a honest and ethical way, especially in the context of government policy and healthcare where lives are, quite literally, on the line.

The ethical responsibilities of a data scientist are present in every step of the workflow of a particular data set. For example, during data analysis several questions may arise: Should I delete this entry during data clean-up? Am I succumbing to confirmation bias? Does my analysis unfairly bias a certain group? However, the concerns don’t end once the data has been investigated and the analysis is decidedly fair and honest. In fact, the next step has more murky ethical quandaries than the last and this is when a data scientist must be most vigilant to one’s own deceptive tendencies.

The delivery and presentation of results, most particularly through data visualization is a step where a data scientist may behave unethically without even realizing it. This is because we, as data scientists, have a responsibility to portray our findings in a clear, non-biased way and the decisions we make during this process can bias our message to our audience. The purpose of this article is to bring to attention various pitfalls that could be made during the visualization process and to urge a constant awareness to the way graphs and figures could be perceived.

There are two main ways a figure with accurate data could delude a viewer: by delivering a an exaggerated/understated message or by imparting the reverse of the message contained in the data.

The most common ways a visualization can exaggerate or understate a message is through truncated axes, aspect ratio (particularly true for line graphs), or by portraying quantity in less than obvious ways (e.g. using the radius vs. area of a circle in a bubble plot). The present goal is to demonstrate the ways these techniques can deceive the audience.

Below is a pair of bar graphs displaying the exact same data:

Example of truncated axis deception. Replica of graph shown on major news network.

Here, the data shown on the left was used to emphasize the difference between the top tax rates if Bush-era tax breaks were ended. Although the data presented is the same, the relative areas of each bar differ greatly.The audience is then left with the impression that the difference is much greater than in reality. To avoid this, one should always start the y-axis at 0 to make sure the relative areas are accurate.

Another example is by illustrating quantity as a dimension rather than an area, this is particularly relevant in bubble plots:

The graph on the left appears to show a much larger increase between the smaller bubble and the larger bubble when compared to the graph on the right. By using the radius over the area, the perceived difference in values is much larger than in the data. This is due to the area being proportional to the square of the radius and therefore is not a linear relationship, deceiving the audience to believe the increase is larger than it is. When using size as 3rd dimension in graphs, one should always use area to describe the value, no matter the shape.

The final example of message exaggeration/understatement that will be discussed here is one that is less talked about, Aspect Ratio:

The graphs here show the same data plotted as a line chart, one with an aspect ratio of 2:1 and the other, 4:1. The angle of the line is clearly greater in the 2:1 depiction and can convince the viewer that the slope is therefore greater in this case. One would need to analyze the graphs much more closer to notice that the slope is in fact the same although the angle of the graphs may be different. One should always pay attention to the aspect ratio used on figures and try to be as consistent as possible, especially when presenting them side-by-side.

Above were all examples of techniques that can deceive an audience to believe that a message is stronger than the data shows and are all unethical in their own right. However, the next example is even worse as it shows a case where the viewer may walk away with the reverse message:

The effect shown here is that of inverting the y-axis. Here, it appears that the the x and y variables have a negative relationship where in fact the data actually has a positive relationship. The inversion of the y-axis is non-standard in graphs and one should always use the Cartesian coordinate system in any sort of graph to avoid the audience to walk away thinking the opposite message of what the data shows.

All four of these deception techniques actually influence the viewer’s views of the message and this has been shown by an empirical study preformed by Pandey, et al (2015). And these four are no means an exhaustive list of deceptive techniques in data visualization. As data scientists we must dig deep into the potential that our graphs, however well intentioned, may be deceiving our audience in an unethical manner. After all, many times we will be leaned on as the expert on the data we are presenting and have a responsibility to display our findings in as clear and honest way as possible.

References:

  1. Acharya, Govind, and Hunter Whitney. Coursera, University of California, Davis, www.coursera.org/lecture/dataviz-design/practicing-good-ethics-in-data-visualization-YqHHo.
  2. Skau, Drew. “A Code of Ethics for Data Visualization Professionals.” Visually, visual.ly/blog/a-code-of-ethics-for-data-visualization-professionals/.
  3. OBrien, Shaun. “Interpretations of Data in Ethical vs. Unethical Data Visualizations.” Arizona State University, 2017.
  4. Pandey, Anshul Vikram, et al. “How Deceptive Are Deceptive Visualizations?” Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems — CHI 15, 2015, doi:10.1145/2702123.2702608.

--

--

Peter Haferl

Analytical chemist-turned-data scientist with a passion for data-driven investigations and decision making.