Improve Your Visualization Skills Using Tufte’s Principles of Graphical Design

Grab his book to learn deep insights on how to take your graphics to the next level

Rajesh Sigdel
Nightingale
Published in
6 min readSep 10, 2020

--

Photo by Isaac Smith on Unsplash

Every now and then, we encounter graphs and charts that fail to represent the spirit of the underlying data. This may be hard to believe, given the advancement of understanding of statistics and technology, however, “junky-charts” often find their place in scientific literature and media outlets.

One recent example is the Georgia Department of Public Health’s blundering of a bar graph (Figure 1) where they reordered x-axis of time (days) to show a downward trajectory of COVID-19 cases in the state (You can read more about it in The Atlanta Journal Constitution).

Figure 1: Coronavirus bar chart from the Georgia Department of Public Health, shared by State Rep. Scott Holcomb (D) who noted the discrepancy on his Facebook page.

Given the sensitivity of the information, this error by the Georgia Department of Public Health could have led to a damaging situation. They later rectified this mistake. But why do these graphical errors keep occurring in the first place? Answers to these questions can be found in The Visual Display of Quantitative Information, first published in 1982 by Dr. Edward Tufte.

Figure 2: Book cover of The Visual Display of Quantitative Information.

Tufte’s work provided a major breakthrough in the field of visualization and changed the illustrator’s perception of statistical graphics. He cites numerous examples of substandard charts throughout the book. He also lays out the principles on how to effectively narrate, investigate, and summarize data using graphical design. Reading this will help you refine your graphs which will aid you to become a better data scientist/analyst.

Let us first discuss the reasons for graphical distortions. Tufte explains three different doctrines of inferior graphical work.

  1. Lack of Quantitative Skills of Professional Artists: Illustrators having no experience with little experience in statistics often lack competency in analyzing quantitative evidence. They often perceive charts and graphs as “create, concept, and style” rather than aiming to capture the essence of the data. This leads them to focus on “beautifying data” that compromises “statistical integrity”.
  2. The Doctrine that Statistical Data is Boring: Designers of inept graphics often treat statistics as “boring” and “tedious”. Because of this misconception, they often unnecessarily inflate the evidence present in their datasets with decorative styles. Tufte mentioned in his book that the doctrine of boring data also serves political ends to promote certain interests over others (page 80).
  3. The Doctrine that Graphics are Only for the Unsophisticated Readers: Illustrators who believe in this doctrine think that readers are not sophisticated enough to understand the complexity of words in the text. This leads them to unnecessarily beautify and animate their graphs to entertain their readers.

These erroneous presumptions lead to: “graphical distortions” and “over decoration”. Graphical distortions can be attributed to “chart lies” and over decoration can be explained by the term “chartjunk”. Let us discuss what Tufte means by these terms.

Chart Lies: Chart lies can be quantified using lie factor. The formula for lie factor is given by

In simple words, lie factor is a ratio between the effect of a graphic that represents the numbers in a dataset and the effect of the numerical quantities themselves. If the ratio is 1, the graphic is able to represent the real essence of the data. A ratio of more than 1 or less than 1 exaggerates or underrates the main theme of the data respectively.

Portraying different information to the audience other than what the graph is actually depicting also falls under the category of chart lies. Recently, the graph of the ratio between confirmed deaths and confirmed cases, also known as case fatality rate, has been put forward to support the argument that Coronavirus cases in the United States are under control. Contrary to the claim, a declining case fatality rate indicates that progress has been made in treating COVID-19 patients. The graph of the ratio between confirmed deaths and the total population, known as mortality rate, is a better way of communicating the information about the impact of the virus in a community.

Chartjunk: We may recall viewing a figure in a newspaper or in scientific literature that is heavily decorated with busy grid lines, cluttered information, or dark background colors. These extra decorations that do not provide any new information to the reader is called chartjunk. Some graphic designers create overly complex visualizations by adding chartjunk to demonstrate their artistic flair. Graphics, ideally, should be easy to understand. Very few readers want to go through puzzle-solving maneuvers simply to make sense of visualizations.

The most common chartjunk is a Moiré effect, where the artist plays with different graphical patterns to create an appearance of a vibration in the chart. Moiré vibrations create a distraction in the design and the reader often has to gaze back and forth between the legend and the graph which creates a “physiological tremor to the eye”.

Figure 3: The Moiré effect creates a physiological tremor to the eye where the reader has to gaze back and forth at the legend and the bar chart. Illustration taken from the 2nd edition of the book (page 120).

What methods and techniques can we apply to eliminate chartjunk from our graphics and improve our design? To answer this question, we first need to understand the concept of the data-ink ratio. Tufte describes the data-ink ratio as the ratio between the non-erasable core ink of a graphic and the total ink used in the graphic. Mathematically, it can be expressed as:

Data ink is the non-redundant information in a chart, and if it is removed, the chart would loose its main content. A good practitioner strives for a high data-ink ratio by deleting non-data ink from the display. It is also important to remember that graphics might not be the best option to represent small datasets. In those circumstances, a simple table with background information about the data may suffice.

Edward Tufte provided very unique insights about data visualization that still have relevance in today’s modern world. The principles I discussed in this article only scratch the surface of the deep insights on visualizations that Dr. Tufte’s book provides.

Tufte, however, is not without his critics. Some of the principles that he espoused, especially data-ink ratio and chart lies are difficult to quantify. Another principle that has received some criticism is the concept of chartjunk. A recent study by Bateman et al. (2010) concluded that the embellished charts may actually be easier to remember than the kind of simple graphs that Edward Tufte advocated in his work. Their study also reported that people’s accuracy in describing the embellished charts was the same as their accuracy in describing simple charts. This suggests that chartjunk might not be as detrimental as Edward Tufte perceived.

Figure 4: Example from the Bateman et al. (2010) study, demonstrating how embellishment in charts can help people to remember the chart’s content.

Despite these challenges, I still recommend that all aspiring data analysts/scientists add this book to their library. At the end of the day, even if some of Tufte’s principles may be controversial, I believe that they still have intrinsic value that cannot be overlooked.

I would like to thank Jason Forrest for helping me improve this article. For more posts like this, you can follow me at my Twitter account.

--

--

Rajesh Sigdel
Nightingale

Research Assistant and Doctoral Candidate at the University of North Carolina at Greensboro