Using color in data visualization and information design

Published in

Data Science at Microsoft

8 min readApr 13, 2020

Color is an important tool in the construction of information dashboards and data visualizations. In natural and built environments, color can signal danger, attention, attraction, and other states. Analysts and data scientists come to color with specific goals in data visualization and information design: Making useful distinctions, directing the user’s attention, and perhaps communicating or reinforcing a brand identity, among others. Color can be employed to help accomplish all of these.

But color brings challenges, too. Because it can dominate the visual field, it may distract instead of enhance, taking center stage over the information it is intended to convey. In extreme cases the use of color can interfere with comprehension, leading to misinterpretation or at least ambiguity. Some differences in color may also not be discernible by those who are color-blind.

Used effectively, color increases the informative value of a visualization — and even the appeal of the how the information is presented — without dominating. Achieving this means that color fades into the background, along with the rest of the design or data visualization elements, so that the message is what stands out and is remembered.

Metaphorically speaking, instead of bursting into a crowded room and shouting “COLOR!” so that everyone stops and stares, I want to enter the room, join and help guide the conversation, and mention color as part of the dialogue. In this article I share ways to think about color and show some ways to use color to achieve specific information design goals.

An example of color use in wayfinding

Consider roadway signs. Although ubiquitous, they are such an integrated part of the roadside landscape that it’s easy to notice them without actually seeing them. They guide, inform, advise, and warn — all in a small amount of space in which they can be noticed, decoded, understood, and acted upon quickly, at a glance and at a distance. These purposes are probably already familiar: Almost all of them are the same as for information design and data visualization.

Their use of color is especially relevant. According to the Manual on Uniform Traffic Control Devices used in the United States, the specific colors of a roadway sign indicate the information contained. Regulatory signs use only black, white, and sometimes red. Warning signs use only yellow and black. Guide signs use only green and white. Color is applied with a clear intent for motorists to quickly take away specific information so they can act on it, as shown at left in Figure 1 below.

*Figure 1: The customary U.S. roadway sign at left uses color purposefully and deliberately for quick comprehension and action, in contrast to the one at right.*

Just as color can communicate purposefully in roadway signs, it can do the same for us in information and data visualization. In this article I look at ways I have used color to communicate key information in dashboards and data visualizations I have developed.

Start with a single color

Color choices abound in modern tools, whether, for example, in a package such as Power BI Desktop or in code with R and ggplot2. Default settings are often set to display color from across the spectrum or may otherwise make it easy to use several colors in a single visualization. But starting with multiple colors just because they’re available may increase the user’s cognitive load (the working memory resources necessary to process information) by diminishing clarity.

In my last article, using grids to improve information dashboards, I recommended considering information design and user experience goals when starting information design and data visualization problems. A significant consideration in this approach is to determine what users should take away from the visualization being constructed. Color is a powerful way to help accomplish this. When I use it to direct attention or make a distinction, I move from merely reporting data to telling a story about what’s important, and I increase the impact of the information so it can be acted upon.

To start down this path, I begin with a single color (which, not incidentally, also makes the information design process easier as there’s one fewer initial design variable to manage). I choose a medium level of shading for that color — i.e., not over- or under-saturated. In addition to being easier on the eyes, starting with a medium shade enables other shades of that color to be used for contrasts. I choose a darker or more saturated version of the same color to make a distinction or draw attention, and a lighter or less saturated version of the same color to de-emphasize an element or help it fade into the background. This approach is shown is shown in the first two graphs in Figure 2 below.

An additional advantage of this approach is aesthetic: It eliminates concern about whether the colors look good together — what color theorists call color harmonies, of which there are several. This is because shades of a color combine to form what is called an analogous color scheme, which naturally forms a visually comfortable presentation.

Contrast this approach with one, shown at right in Figure 2, using the ggplot2 scale_colour_hue() default color scheme with evenly spaced colors from the HCL color wheel, as described by Hadley Wickham in his book ggplot2, second edition. This graph increases cognitive load, taking the user longer to process, not only because the multiple colors themselves don’t suggest what’s important — in fact, the use of red in one of the bars might be misinterpreted as noteworthy, which is not true in this case — but also because of the additional element (the red box) employed as a result to draw attention.

Figure 2: Ways of showing a distinction in bar charts: With purposeful color choices for a simplified presentation shown at left and in the middle, and with ambiguity, an extra element, and more cognitive load shown at right.

Adding a second color

Using shades of a single color not only provides a way to make useful graphical distinctions, it also sets the stage for introducing a second color for a defined purpose. For example, Stephen Few describes the utility of using a red dot to flag items in data visualizations requiring user action in his book Information Dashboard Design.

I show an illustration of the red dot in Figure 3 below at left, which indicates the user is to take action for item E, which is below target. (Building on this approach, if I consistently combine this use of red with a specific shape, it helps serve the needs of color-blind users who are unable to distinguish red but can rely on seeing the shape.) Of course, red is often used to signal alarm as well. If no warning is intended, I either use a more “neutral” color for the dot, shown in Figure 3 in the middle, or I change other elements of the visual to highlight the element needing attention, shown in Figure 3 at right.

*Figure 3: Purposeful ways of drawing attention: Stephen Few’s red dot, a dot with a contrasting color when a warning is not intended, and a contrast in one of the elements of the visual.*

I can use a second color to draw useful distinctions in other ways, too. Consider a line graph in which the lines overlap, as shown below in Figure 4 at left. If the graph contains only a couple lines, I can use a baseline color of medium shading and a darker shade where I want to show what’s important. If the graph contains many lines, as shown in Figure 4 in the middle, I can introduce a second color for this purpose. By indicating emphasis in this way, I am helping direct attention and telling the user a story of what’s important while minimizing cognitive load — enabling the viewer to focus on the impact of the information itself instead of trying to figure out how to read the visualization, as is the case in the graph shown in Figure 4 at right.

Figure 4: Ways of showing distinction in line charts: Purposeful use of one and two colors shown at left and in the middle readily reveals what’s important, while use of multiple colors shown at right obscures it.

Using multiple colors

A natural question might be when to use multiple colors. I would answer by saying there are better questions to ask, namely by going back to my information design and user experience goals. For every color I want to use, starting with the first one, I ask whether it serves a purpose. In cases where that purpose is not obvious — or perhaps nonexistent — I stay with as few colors as I need to tell the story about what’s important. This eases comprehension and enables action.

An illustrative example of this approach can be seen in two presentations of a graph showing cumulative views of a set of web-based articles over time. In the graph at left in Figure 5 below I use blue to call attention to the trendline of one particular article. In this way I form a narrative that contrasts the performance of that article with the others, which are presented more for context than for their individual details. In the graph at right in Figure 5, I use multiple colors from the default in Excel, but this time with purpose: The aim of the narrative I construct here is to focus not on the performance of any one particular article, but on comparing and contrasting all of them. Note that I would not use the default Excel color scheme in this way without a specific intention like this one in mind, after first formulating my information design and use experience goals for the data visualization.

*Figure 5: Two charts, one at left with two colors and one at right with multiple colors. Each tells a different story about the same data.*

Another example of using color deliberately can be seen in two presentations of a heatmap. The purpose of a heatmap is to graphically show contrasts in a set of underlying data, which it achieves with a shading matrix. Heatmaps are often presented in multiple colors, but they can be interpreted more easily in cases when shades of a single color are used for the gradient, as shown in Figure 6 below at left.

This is because of the natural perceptual ordering offered by a single-color gradient. Because the purpose of conveying the information is best served with shades of a single color in a heatmap, there is no reason to introduce additional ones. In fact, the use of additional colors, as shown in Figure 6 at right, increases cognitive load and impedes the comprehension the heatmap is meant to convey.

*Figure 6: Two heatmaps, one at left with a multi-shaded gradient of a single color and one at right with a multi-colored gradient, offer a contrast in ease of comprehension.*

Cole Nussbaumer Knaflic discusses use of color in information presentation, and other related topics worth reading, in her book Storytelling with Data.

Summary

An effective approach to color in information design and data visualization is to use it to communicate meaning in ways that are purposeful and deliberate. Start with determining what users should take away from a visualization while considering information design and user experience goals. With these established, begin with a single baseline color and then add other incremental color only as needed, focusing on delivering a visualization that supports the goals you’ve established and that clearly communicates what’s important.

This approach results in transforming data into information that stands out and is easy for the user to discern, leaving the design and color palette to fade into the background. For the data visualizer, it makes the information design process faster because fewer design variables are introduced that need to be managed. The result is an informed viewer who understands where it’s important to direct attention and take action.

For more information on visual information design, see my article on using grids to improve information dashboards:

Using a grid to improve information dashboards

medium.com

Casey Doyle is on LinkedIn.