From Good Charts to Great Visualizations
Principles to transform good chart design to great infovis
Ryan Bales has done an excellent job of outlining a recommended set of Do’s and Don’ts when designing charts and in general, visualizing data. These simple recommendations can go a long way in ensuring that charts and graphs are better designed and more effective. However, as with anything in design (design being the art and science of creating a solution), a large part of the success of even the most simple chart lies in its conceptual integrity.
In this article, I highlight the conceptual counterparts to Ryan Bales’ list, examining the key conceptual principles that designers must consider when design charts and more broadly, working with data.
Appropriate > Common
Most novel visualization types were born out of a need to convey a certain statistic or insight from data. Some were conceived for accuracy and precision, while others for relativity and pre-emptive summarization.
Some chart types are more popularly used than others, owing mostly to the ease with which they are created and/or reproduced, or the efficiency with which they can communicate a central point. These include familiar chart types that we see on television or in newspapers — pie charts, line charts and occasionally a scatter plots. Users tend to see these chart types often and reading them becomes common knowledge, making them easy to use.
When William Playfair first described the pie chart, he wrote:
“Each circular figure represents that country, the name of which is engraved under it, and all are arranged in order according to their extent…
The countries stained green are maritime powers; those stained of a pale red are only powerful by land…
The figures marked directly above the circles (as 5 over Russia and 14 over Sweden) indicate the number of persons living on each square mile of country. The figures within the circles show the number of square miles in the countries they represent.”¹
This wasn’t a completely fleshed out visualization type, needing great amounts of context to convey it’s point. It was novel because it used circles to communicate proportionality, and wedges to show categorization its components. Today, pie charts are so commonplace that it’s readability is a given; it is common knowledge that pie show part-to-whole proportionality using angles of wedges at the center of the circle.
However, it still remains incredibly hard to accurately read values in a pie chart with precision, unless a label is given. It also remains hard to compare individual wedges to each other; or to compare multiple pie charts that are either of different sizes, or that show absolute numbers. A bar chart is much better suited for facilitating such comparisons, and is just as easily read. Sometimes for the sake of simplicity, especially when there are multiple wedges in a pie chart, we use donut charts. This removes the angles at the center of the pie from view, making it harder to discern the radians. A donut chart can be easily interpreted as a radial stacked bar chart. It makes it even harder to compare multiple donut charts.
Drawing a line between two points in a scatterplot does not automatically change it into a line chart; similarly, filling in the area under a curve does not convert a line chart into an area chart. The construction of these charts depends entirely on the nature of the data in question. The key difference between scatterplots and line charts is that former usually denotes discrete values, while the latter represents continuous values.
One of the most important elements in the line chart is the slant of the line; used to denote change, often over time. It follows one observation as it’s value changes over time. As the name suggests, the star of the area graph is the area under the curve. While it is based on the line chart, the area under the curve represents volume and accumulation taking into account both axes; therefore, making it a more natural choice to seek out distributions or the shape of the dataset.
Using appropriate chart types, determined by data structure, data type and key communication should take precedence over using common chart types. If a common chart type — pie, donut, bar, column, line or area chart — appears to be the best way to visualize a dataset, then the form’s popularity is an added advantage. However, far too many datasets have been represented too simply, too poorly or too sensationally because the incorrect visualization strategy (or chart type) was used. The purpose of a chart is to tame the complexity of a data table within context.
Visual elements inherit meaning and encodings
A chart comprises visual elements from a finite, exhaustive list.
- Axes, Ticks, Lables
- Marks of different shapes (dots, squares, etc.)
- Lines, of varying stroke widths or colors or textures
- Areas/Regions/Surfaces, with colors or textures
- A repeating mark of different sizes, with colors or textures
Similarly, there is an exhaustive list of visual metrics used to read charts and graphs:
When designing charts, encoding the value of an observation into any combination from these two lists assigns meaning to the visual element created. Users have to learn these assignments (or visual mappings, encodings), and have to spend cognitive power to decode these. Visual encodings can be very obvious and apparent to the designer of charts, but users have to invest significantly to learn to decode and read charts. Changing the meaning of visual encoding systems increases the cognitive cost price, making it harder to follow. Using absolute values and relative values using the same visual encodings can make the same dataset look very different, and reveal completely different insights. Not to mention, this can confuse the reader. Consistency needs to be applied not just to visuals, but also to metrics in and across charts that co-occur.
The advantage with using visualizations that already exist (as against finding novel ways to represent data), is that encoding systems have already been put in place. Designers need not worry about re-inventing the wheel. The disadvantage here is that visual elements come with baggage; there is a pre-existing connotation assigned to individual visual elements that must be considered when designing a visualization. Lines joining observations on a chart show dependence/ relationships; colored regions are multiplicative, not additive. Using a family of blue (or any other graduating color palette) on unrelated variables creates the illusion of relation. Using the same color scheme on charts depicting different objects or observations becomes misleading. There are many things to consider when designing charts; a well advocated for principle is that the evidence lies in the data. Charts, graphs and visualizations should be able to present evidence in numbers in a truthful and meaningful way that can be scrutinized and easily replicated.
Charts are Statistical Evidence
A basic understanding of statistics is helpful (if not essential) for any designer who designs charts. We say a good chart is one that is created based on the data. In order to evaluate if or not a chart is really good, we need to understand the data. This is where statistics, the language of data, and statistical thinking is utilized.
Statistics is the science of making decisions under uncertainty.
— Savage, 1954²
Think of statistical thinking as the data-driven counterpart of design thinking. Where designers rely on a deep understanding of audiences³ to make decisions, statisticians rely on data. Both forms of thinking value evidence for decision making; one is largely quantitative, the other highly qualitative.
In UI/UX design, we conduct interviews, card sorts, user studies, focus groups, etc. and synthesize them into stories, user journeys and flows. Based on these stories, we craft prototypes, conduct tests, and iterate. Our design intuition is guided by evidence in the form of these user stories and journeys. In statistical thinking, synthesized evidence takes the form of data visualizations. The patterns and insights revealed in visualizations are used to convince an entity (people, organizations, etc.) of a belief (or hypothesis or claim) and act upon it in a certain way. As such, transforming data into a visual requires conceptual and statistical rigor comparable to the rigor of a design process when solving complex problems. Many great data visualizers and information designers operate at the intersection of statistical thinking and design thinking to innovate and create.
A designer need not be a data scientist, but a basic know-how of data types, summaries, and different analysis types becomes a powerful signal to evaluate the visual design of a chart or graph. A strong statistical intuition can be useful in the prevention of errors or misleading information, and also to identify interesting patterns that should be surfaced in a visualization.
There are datasets that have near identical summaries but are very different; relying on these summaries along would likely result in poor decision making. Here, visual evidence is used to highlight the unique differences between these datasets given their near-identical summaries. Visual designers can use their skills to make visualizations more effective and efficient; but the content matter originates from a different domain expertise. The need for better designed charts and dashboards has become more evident, and designers are in a position to respond and act. A basic but strong statistical foundation is central to a better designed chart or dashboard.
Special Notes: Data Humanism
- Playfair, William. The statistical breviary; shewing, on a principle entirely new, the resources fo every state and kingdom in Europe; illustated with stained copper-plate charts, representing the physical powers of each distinct nation with ease and perspicuity. By William Playair. T. Bensley, J. Wallis [etc., etc.], 1801.
- Savage, Leonard J. The foundations of statistics. Courier Corporation, 1972.
- “Design Thinking.” IDEO U. N.p., n.d. Web. 09 May 2017.
- Matejka, Justin, and George Fitzmaurice. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2017.
This post was originally a comment on a post on UXdesign.cc . It was re-written (nicely, with context) as a complete medium post in itself, since showing up on Digital Dialects.