Color Advice for Data Visualization with D3.js
Using color in a post-category20 world
D3.js has done so much to advance the practice and theory of data visualization. Without much effort you can find examples of visualizations of algorithms, time, mechanical principles, gender discrimination, and yes, even the army of a not-actually-quite-so-small Frenchman moving through Russia.
As much good as it’s done, D3.js has also allowed an unruly mob of amateurs the ability to create really bad data visualization. I know, I’m one of them. My very first uses of D3 were misleading and silly when not nearly illegal uses of data visualization. I can’t condemn that, and in fact I think it’s the best way not only to learn D3 but for interlopers and outsiders to push the boundaries of the field and find new methods for the visual display of information.
But in one area D3 has acted as an anti-catalyst: the use of color. If you look at any collection of D3 examples, you will be struck by the following three colors:
Granted, it’s not the only color you see, often you’ll still see CSS “blue”, “red” and “black”. And this isn’t just in D3 — the Tableau palette is right up there on the list of color schemes to make your chart look like every other data visualization product ever made. One of the major changes in the newly released D3 v5 is to finally get rid of this “category20” color scale. Integrated into this version and presumably there to help console those in their time of loss is
d3-scale-chromatic which is mostly a wrapper around the tried and true colorBrewer scales by Cynthia Brewer.
The field of data visualization is replete with warnings that Color is Hard. But color is powerful. If you don’t feel capable of selecting a color scheme based on the fundamental principles of how humans perceive color, then what makes you think you can select between a hive plot and a Gannt Chart?
Category20 might be dead but bad color use will never die. So here are some tips for those who want to actually improve their use of color.
- Avoid default color schemes. At the very least, use a colorbrewer scale. If you really can’t figure out how to pick colors, use Colorgorical. It has issues but it at least forces you to have a moment for considering and implementing color.
- If you’re using color to distinguish between less than 10 or 20 categories of data, don’t use a 10-category or 20-category scale. The first two colors of the old category20 scale were both blue, and when your audience thought those two things were related, that’s because their visual cortex told them they were.
- Stop using CSS primary colors. Not because there’s anything inherently wrong with “red” or “blue” but because it’s a sign that you’re probably not thinking about it. You should be thinking about the color of your elements, it shouldn’t be an offhand thing you assign arbitrarily.
- Likewise, if you want a color to be less saturated, don’t use opacity, go out and find the RGB code for the less saturated version of that color. Transparency has combinatorial effects that you should not casually ignore just because you wanted a pastel palette.
- Make sure your palettes are made up of the same kind of colors, like pastels or high-key colors. Don’t know what that means? Go read up on color palettes and find out, because color is important and you can’t consider yourself a qualified data visualization practitioner if one of the main channels you’re using for conveying information is something about which you don’t even understand the most basic terminology.
- Be aware of what color-blindness is and try to use colorblind safe colors. That doesn’t mean avoid red and green, but it does mean if you’re encoding with red and green you are also providing other cues when your audience is colorblind. Similarly, not all colorblindness is the same, so when using “colorsafe” palettes make sure you understand just how safe they are.
- Don’t use interpolated color ranges to indicate quantity. Color ramps are horrible for quantity. Learn to bucket your quantities using quantizing, quantiles, CKMean and the like, so you can use graduated color ramps (roll your own or use a colorbrewer ramp like
This post is a more general critique of the lack of thoughtfulness around color in data visualization my strongest criticism relates specifically to the standard approach to color in D3.js. So how would you go about integrating a more thoughtful approach to color with D3? It’s easy. All those categorical scales in the examples you’re following are either using a bucketed scale or a continuous scale. If it’s a continuous scale for color, delete it and replace it with a bucketed scale. If it’s a bucketed scale, it takes in a value (set in the
.domain()) and returns one of those twenty horribly overused colors. So any time you see something like:
const colors = d3.scaleOrdinal(AnyNameOfColorsIRecognize);
Just replace it with your own array of colors you picked out. Where would you get such an array? You could buy a copy of Color Index 2 by Krause and pick out a palette that you feel best suits the mood of the piece you’re working on. Or, if that’s too much, you could use Tristen Brown’s Color Picker based on the Chroma.js library to pick a ramp with the necessary stops. Or, if even that is still too much authorship, you could rely on the clustering algorithms over at i want hue to pick distinct colors for you (even giving the option of creating human unreadable 30 or 50-color combinations).
Either way you do it, you’ll finally end up with an array of colors such as:
["#A07A19", "#AC30C0", "#EB9A72", "#BA86F5", "#EA22A8"]. At that point, simply use that array as your range in your ordinal scale:
const colors = d3.scaleOrdinal.range(["#A07A19", "#AC30C0", "#EB9A72", "#BA86F5", "#EA22A8"]);
You can use it the exact same way you were using your original scale, and receive in return more distinctive color for the same data visualization product.
Is that the right color scheme for that network? That’s not easy to answer — it’s about as easy to answer as the question of whether a network is the right chart for that dataset. But the right color scheme isn’t the point, because there is no single right color for all your charts. Rather, think about color in your data visualization. Take it seriously, even if in the final product you do use an established color scale. Think about if you’re trying to signal success or a warning and what kind of color you feel best conveys that. Don’t just grab a color palette because it’s the default — you’d never do that with a chart or a dataset, would you?
A version of this essay was originally posted on elijahmeeks.com in 2015.