# GraphScape: Modeling Similarity & Sequence among Charts

A single chart is often not enough to understand data and to convey a story. So it’s not surprising that people use multiple charts in sequence to convey their findings and craft narratives. For example, the following New York Times interactive article describes a global climate change agreement through a series of visualizations about countries who ratified or rejected the agreement:

However, not all sequences are created equal! Effective presentations involve a logical progression, with related charts shown in an appropriate order. In the example above, if one jumbled the order of charts, the result could be jarring and confusing to readers.

While skilled designers can manually craft effective sequences, we wondered what underlying principles might help guide (and even automate) such decisions. Considering this question drew us to a more fundamental question: *how might we model the relationships among charts?* This led us to develop **GraphScape**, a model for automatic reasoning about chart similarity and sequence. GraphScape models can be used to search the space of visualization designs, suggest related alternatives, and recommend chart sequence orders that audiences can more easily interpret.

Before diving into the details of GraphScape, let’s start with an example. To prepare for a trip from Seattle to Denver, we created charts comparing the April 2017 weather in each city. Here Denver is in blue, Seattle is in orange.

First, we plot the number of Sunny days in both cities. It is exciting that there are 3 times more sunny days in Denver! Next, we compare daily temperatures. Seattle appears to be much more consistent. We might need to bring full wardrobes to Denver... Lastly, we plot temperature against wind. The wind in Denver is much stronger than in Seattle!

As we visit each chart in turn, we try to use a sensible order. The first chart starts with a simple comparison based on two data fields, which seems to be a good starting point. The subsequent two charts each involve three data fields, and use the same encodings for both Region (color) and Temperature (y-axis), and so it makes sense for them to be adjacent.

If we scramble the order, the sequence requires more effort to understand. Charts with similar data fields and encodings are no longer adjacent. The viewer must repeatedly reconsider what data they are looking for as they view each plot.

There is also a challenge related to how many sequences one needs to consider. In this case, there are only 6 possibilities. But as the number of charts increases, the number of orders explodes combinatorially! For a set of 6 charts, there are 720 possible sequence orders. Automated support might help designers identify reasonable sequence orders that they can then manually refine.

By modeling the relationships among charts, a GraphScape model can reason about the relative difficulty of interpreting one chart after having viewed another. Let’s take a look at how GraphScape works.

### GraphScape: A Directed Graph Model of Charts

GraphScape is a directed graph model of the visualization design space. Nodes represent Vega-Lite charts and edges represent atomic edit operations. These operations can express any transition between two given Vega-Lite charts when combined. Edge weights represent estimates of how hard it is to follow those operations. We then compare transitions (moves from one chart to another chart) and sequences (ordered series of charts) by looking at these estimated transition costs.

GraphScape enables us to do some interesting things such as searching for alternative designs given some criteria. Let’s say we want to draw a scatter plot, but see that it suffers from overplotting. We can use GraphScape to find similar designs that are more scalable — that is, that render the data using fewer points. Using a GraphScape model, we can conduct a graph search to find alternatives that reduce the number of visualized data values. A first candidate reduces the data via random sampling, by applying an *Add Filter* operation. Continuing the search, we also find another alternative: a binned scatter plot. GraphScape can suggest candidate charts, presented in order of increasing transition cost.

### Building a GraphScape Model

GraphScape was built in three steps. First we identified all atomic edit operations: different types of changes that one could make to a chart, like the type of mark, what visual aspect a data field is mapped to, and so on. Secondly, we ranked them by interpretation difficulty: how hard is it to follow those edit operations? In order to create the ranks, we self-administered a set of triplet comparisons where a judge views an initial chart then chooses which of two target charts is easier to understand given the initial chart. The result of each comparison provides a linear inequality among edit operations. Lastly, using these linear inequalities, we can estimate costs (edge weights) for each operation using linear programming. Given these edge weights, we can calculate transition costs (a proxy for the “cognitive cost” of transitioning from one view to the next) as the sum of edge weights along the shortest path in the GraphScape model.

However, user studies convinced us that the sum of transition costs is insufficient for recommending good sequence orders. While the edge weights express chart-to-chart transitions, they do not capture global structure such as repeating elements (notice, for example, how the NYT example above repeatedly transitions from a map view to a line chart). In a formative study we found out that participants preferred (1) to start a sequence with entire (“raw”) data points rather than aggregated data, (2) to view time slices in chronological order, and (3) the use of subsequences with repeated transition patterns. To capture these preferences, we define a *sequence cost *using a number of terms: the sum of transition costs (including a transition from an “empty” chart to start the sequence), a term to penalize filtered subsets shown in an unsorted order, and a term to discount the sequence cost when repeated transition patterns are used.

### GraphScape Matches User Preferences

To evaluate GraphScape, we conducted a controlled experiment in which participants manually ranked a set of chart sequences. We then compared GraphScape’s recommendations (based on sequence cost) with participant preferences. Among other hand-crafted chart sequences, GraphScape’s top-recommendation received the highest rating from participants in all but one case. More generally, GraphScape’s overall ranking of sequences strongly correlates with participants’ ranks. Using only the sum of transition costs or our sequence weighting term alone did not perform as well: when used together, both terms make important contributions to improving sequence recommendation quality.

However, much future work remains. GraphScape currently considers edits to a chart specification with no notion of the underlying data. Incorporating data statistics or semantics might further improve our model. For example, adding a color encoding for a variable with only a few distinct values will be easier to interpret than a variable with hundreds of category values.

In summary, GraphScape provides a model for automatically computing relationships between charts, and can be applied to search for alternative designs and recommend sequences for presentation. For additional applications, as well as more details about our methods and experiments, see our CHI 2017 paper. Source code for building and using GraphScape models is available at the GraphScape GitHub repository.

*This article was written by Younghoon Kim, with input from Kanit (“Ham”) Wongsuphasawat, Jessica Hullman, and Jeffrey Heer.*