All charts and graphs are scatter plots at heart

By Arches National Park [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

All charts are scatter plots. In a perfect world this would be an unnecessary truism, but in our real world, where we get so easily distracted by details, it must be said from time to time. If you really want to understand data visualization you have to be aware of it all the time.

Last year, I published a book titled Data at Work. I couldn’t avoid discussing this right on chapter one, calling the scatter-plot-at-heart a “protochart”, simply to save “scatter plot” for the display of the relationship between two meaningful variables. No matter what you call it, you can always reduce a chart to pairs of coordinates in a Cartesian plane (please note that I’m leaving out networks, maps and figurative visualization, which deserve a more nuanced approach.)

When you are fully aware of it two things become apparent: chart types are meaningless and you are free to do whatever you want (some restrictions apply).

Now, you can take advantage of certain basic templates and transformations people are familiar with (if it’s a time series, you can represent time along the x-axis, connect the dots and make a line chart), and you should definitively learn about visual perception, and also adapt to user’s goals/skills. Yes, there are some constraints, but you can interpret them creatively (please avoid ignorance-based creativity) and go beyond standard chart types.

Amanda Cox famously said that:

There’s a strand of the data viz world that argues that everything could be a bar chart. That’s possibly true but also possibly a world without joy.

Some people didn’t like the first sentence, but she’s demonstrably right (check section “graph design solutions”). I often get back to this quote and play with it:

Not only the sky’s the limit when you start your visualization with a scatter plot (is the night sky the ultimate scatter plot?) but this perspective can also help you overcome the limitations of your tools. For example, this square four-fold graph was made in Excel:

A square four-fold graph make in Excel

Making a round four-fold graph is trickier, but it can be done. All you have to decide is whether the cost-benefit ratio is positive. If not, choose a different tool.

Design data

In a graph, you can identify data points, design objects (bars) and supporting objects (grid lines, labels). But there is a forth category, often called “dummy data” (now I prefer to call it “design data”). Take a look at the chart below: the icons above the bars are read as subtitles, not actual data, but they are a data series, like all other data series. In other words, design data is data you use for design purposes. Again, this is an Excel chart.

A pictorial bar chart

In some cases, you can use tool-specific functions to achieve the same visual results, but there is an interesting side-effect of using design data: the graph is much easier to replicate from tool to tool: if those four images were placed manually in Excel you’d have to figure out how to do it in R, but if that is a series, you can use the same rationale both in Excel and in R.

A point in a plane has no dimensions (by definition). Whatever it becomes is up to you. It can be a large circle or an image, it can be an invisible placeholder for something else, it can be connected to an axis and become a bar, or connect to other points to display trends.

Putting it into practice

I’m not a designer (my life’s unrequited love) but I’m unable to stay handcuffed to Excel defaults. In other words, professionally I live in a barren desert between Excel and graphic design. Apparently, no one likes it here.

By Pierce, C.C. (Charles C.), 1861–1946 [Public domain], via Wikimedia Commons

To me, the communication part of data visualization is all about effectiveness, but to be effective you can’t overlook the emotional dimension. That’s what people are trying (and failing) to do when they add pseudo-3D effects to their graphs. There is a difference between elegant makeup and clown-level makeup.

With that in mind, I recently decided to explore more systematically what I can / I’m able to do with Excel scatter plots (and other chart types, if necessary). There is a long history of people trying to push the limits of what can be done when making charts in Excel, Jon Peltier been the obvious reference, but my perspective is a bit different, more on the design side, less on specific Excel techniques.

The two charts above come from a portfolio page on my blog (I named it Excel Charts Shop because you can buy the files, if you’d like to use them with your data or are curious.) The four-fold graph takes advantage of a connected scatter plot to display the squares, while the second one explores point placement. Is some cases, you’ll need to calculate polar coordinates first, but in general it’s more about how you use design data in a Cartesian plane.

I agree some graphs don’t look much like scatter plots. Take the bar chart, for example. The data points vary along a single axis, and we add a “design coordinate” (the categorical axis) so that it’s easier to identify / label them. A doughnut is a polar version of a stacked bar chart. Etc.

So, all charts are scatter plots at heart. That’s one of the reasons you shouldn’t forget the heart when designing them.