Five Ideas for Achieving Clarity in Data Visualization

Spring and Fall (1880) by Gerard Manley Hopkins, written using continuous script (left)

Until the seventh century, Latin text was written using continuous script. For those unfamiliar with the language, the ambiguity of that style of writing proved a tad confusing. It’s just as well that some Irish copyists came along and introduced spaces, word-breaks and punctuation marks, all of which contributed to making the text less prone to misinterpretation.

There was a similar drive for clarity in fine printing in the 18th century, due to John Baskerville’s work. His innovation was the development of types that were simple and readable, a stark contrast to the elaborately decorated and scarcely legible ones that were common in Europe at the time. Alan Bartram puts it best,

“…such simplicity, even minimalism, was revolutionary. It was a defining moment in bookmaking, ridding it of the irrelevant, flowery decoration of which we have seen so much hitherto. The repercussions were to be felt not only in Britain, but in continental Europe, and even in America.” - Bartram, Five Hundred Years of Book Design

Clarity, it turns out, is a valuable quality in fields where people are in the business of informing. In data visualization, one way to achieve clarity is to ask,

“How much am I asking the reader to invest in order to understand this graphic, and is the payoff worth it?”

Ideally, it would be nice if the reader could understand our graphic with minimal investment. And if that is not possible, we’ll want to make sure that we relieve any cognitive debt that we end up incurring in the form of increased value.

Let’s start with the first one— graphics that require minimal investment.

When you look at the chart to the left, you can quickly tell what the takeaway message is, the reason being that it’s a type of line chart, and most people know how line charts work.

We know how they work because they’ve become part of our educational conditioning, you might say, and so we understand them by virtue of their ubiquity — they’ve been around for over two centuries.

We also have studies that sort of corroborate that belief, like a recent survey by the Pew Research Center, where they showed a scatter plot to a sample of American adults and found that 63% of respondents understood what the chart meant. That number went up when they controlled for education level.

So the first idea for increasing clarity is to use chart types that people already know.

Idea 1: Use charts that people already know

There is another thing in that first chart that we see in this one too. We have a group of lines that have similar angles and then another line with a noticeably different angle, and we find that we’re instinctively drawn to this one line. The reason is probably due to the brain’s ability to detect salience.

When we see things around us, the brain will ignore things that are similar and focus instead on things that are distinct. It’s a biological warning system that, in more distressing times, helped us survive our harsh surroundings.

That’s useful to know. Maybe we ought to write it down as a second idea — make the thing that we want the reader to immediately see stand out.

Idea 2: Make things we care about stand out

A third thing to note about these charts is that we sort of take it as read that elements that have the same color belong together. Elements such as dots, lines and legend labels. This intuition is something that we’ve studied in psychology and is part of a set of laws of visual perception known as Gestalt’s principles of grouping.

Gestalt’s Law of Similarity, de.wikipedia.org

For instance, when we have visual elements that look like each other, we might perceive them to be part of the same group. Similarly, when we have elements that are close to each other, we might associate them with the same group.

So a third idea is for us to know what the reader’s a priori notions of perception are and make sure our graphic is attuned to them.

Idea 3: Be cognizant of how the reader perceives visual elements

A lot of interesting data is hierarchical and multi-dimensional, which means that traditional representations like line charts and bar charts might not always be fit for purpose. The question therefore becomes, is there a general approach for choosing the right representation for complex data?

We can do something similar to what engineers do — identify a set of questions that we want our graphic to answer and a set of qualities that we want it to afford, and then evaluate representations based on the degree to which they satisfy those requirements. Contrast this approach with the alternative that one occasionally sees, which is to take a neat-looking representation and then try to make the data fit it. This approach is much more defensible.

To give you an example, say we want to visualize how teams are distributed in an organization and we care about the two qualities and the five questions shown below. By thinking through our options, we might end up with the dependency matrix as the ideal choice.

Choosing a representation, given five questions that we want to answer and two qualities that we care about

Implementing a matrix might end up looking like the image below. With adequate text and annotations, this prototype proved to be a useful exploratory tool.

An interactive dependency matrix of organizational structure, github.com/almossawi/d3-matrix

We can summarize this methodical way of thinking by saying that the path from the familiar to the novel ought to be fraught with caution and reflection.

Idea 4: The path from the familiar to the novel ought to be fraught with caution and reflection.

When I was a schoolboy, I had a book that explained the human body’s various systems — skeletal, nervous, digestive and so on — through a set of tinted plastic sheets. You would see different facets of the body as you flipped through the book’s pages and the various systems would slowly combine to show the body’s true complexity.

That approach to complexity is known as abstraction, and good abstraction clarifies things by hiding the right detail.

The screen below is part of a marketing campaign that we ran in 2014 to coincide with the release of Firefox 29. The goal here was to share with a broad audience of about 10 million people, what aspects of the web people cared most about, and we did that through abstraction.

The Web We Want, webwewant.mozilla.org

The first graphic on the page is a summary metric that shows what the world thinks of each of six values.

The second graphic unpacks our data a bit more, by showing what respondents in each country think of each of those values.

Then the third graphic introduces some basic descriptive statistics like min, median and max in order to give a sense of the spread for each of those values per country. For instance, here we see what people in the US think of privacy, and then we show countries that care less about privacy than the US and those that care more about it.

This kind of flow from least granular to most granular is to ensure that the average user isn’t overwhelmed. And we do that not by adding detail, but by removing detail. Our fifth and final idea for achieving clarity is therefore to hide the right detail.

Idea 5: Hide the right detail

I’ll leave you with a line by Wittgenstein, “We make to ourselves pictures of facts.” Anything that we create is some model, some interpretation of reality. Even a photo, which some might consider a disinterested artifact, is an interpretation, seeing as the photographer controls the composition, the framing, the exposure, the colors and so on.

This observation, moving as it is, is also terrifying. It instills self-doubt. It makes one obsess about every frame, every design decision, every editorial call. And yet, it contains within it an empowering realization that every one of us is in the unique position to remodel ideas on accord of how we present them, and by doing so, contribute to making those ideas accessible to new audiences. Design for clarity.

almossawi.com · Twitter