Data Changes Everything: How Data Visualization Design and Interface Design are Different
TL;DR: It’s easy to assume that the tools and approaches used for general software design apply equally to data visualization design. But data visualization design and interface design are often deeply and fundamentally distinct from one another. We learned this the hard way when we turned our research lab into a collaborative data visualization design studio for a few years. Data permeates visualization interfaces in ways that pose challenges at every stage of the design process. These challenges are even greater within large visualization teams. By reflecting on and articulating these challenges, we hope to inspire new, powerful data visualization design tools and communication processes.
Vis design and the myth of the full-stack
Ah, the “full-stack” data visualization designer/developer. A single person with a deep understanding of perceptual systems and evaluation methods, a deft web graphics programmer, a statistics ninja, and an empathetic human-centered designer who crafts a captivating data story.
Have we missed anything?
The list of skills needed to compose a compelling data visualization will only get longer as the craft matures and creative and technical ambitions rise ever higher. There will be fewer and fewer individual designer/developers and more and more designer+developer teams. These teams will need specialized tools to support their work and their communication.
Interface design tools for data visualization
In user interface design, the model of the designer+developer team is a familiar one. While designing interfaces was once the domain of software developers, there are now entire teams of interface designers who do not need programming skills to excel at their jobs. Interface designers have plenty of tools that let them create and test interface designs without writing a single line of code. Importantly, these tools also help designers to communicate with developers: they produce design assets, specifications, and code that can be incorporated into the final, functional systems that developers create. These tools are now powerful enough that even individual designer/developers can find them a useful addition to their workflows.
Sadly, for visualization design, these kinds of handoff tools just don’t exist yet. Data visualization design is fundamentally different from interface design in one key way — the data — and therefore requires a different set of tools.
The shape, scale, and content of data permeate every aspect of visualizations, from the appearance of their on-screen elements to their interactions. As a result, visualization design tools need to do everything that interface design tools do — plus give designers tools for understanding their data, designing with it, and communicating those data-driven designs to others. In fact, trying to do visualization design and development in larger teams today quickly reveals a plethora of ways in which contemporary interface design tools poorly serve the data-driven nature of visualization design.
Over the past three years, we have experienced these challenges first hand as we transformed part of our research team at the University of Calgary’s Interactions Lab into a functioning data visualization design studio. Since mid-2016, a team of designers, post-docs, students, and interns has worked full-time alongside data experts at the Canada Energy Regulator (CER — previously known as the National Energy Board of Canada) and developers at Calgary software house Vizworx to create public-facing visualizations of four complex energy-related datasets — all of which have since been published and widely promoted by the CER.
This multi-team setup required us to be vigilant about communication. It highlighted numerous shortcomings in the tools available for creating and communicating about visualization designs. At the same time, it created a sort of “living lab” that let us observe the entire visualization design and development process. We were able to pinpoint situations in which introducing dynamic data creates challenges for traditional interface design and handoff tools. Reflecting on this multi-year experience, we identified challenges in all phases of the visualization creation process: from understanding and characterizing data to design and development. We describe three of those challenges here (the rest are in our IEEE InfoVis 2019 paper).
Data updates can wreak havoc on late-stage designs
Perhaps unsurprisingly, data updates can have cascading effects on all phases of design precisely because all aspects of the design are rooted in the shape and meaning of the data.
We experienced this first-hand while creating the visualization below. It shows, for individual Canadian provinces, a prediction of how the share of energy demand across various energy sources might change over the next several decades. Critically, however, the first dataset that the data provider handed to the design team was one that predicts changes in total energy production, not demand. The design team spent much of the design process building an understanding of this energy production data. They identified the categories of values — different sources of energy, such as coal, oil, or renewables — and the range of values across the provinces and the years. The team then focused design efforts on creating a visualization that would let viewers compare the data for different provinces. (It did so by showing the relative percentage change per energy source because the data values for different provinces sometimes differed by orders of magnitude, making them difficult to compare directly).
Late in this design process, however, the data provider ultimately replaced this dataset with a new one that showed data for end-use energy demand, not production. The design team briefly checked to make sure the dataset did not break the limits of the existing design, and it was fine. The formatting was similar and the ranges of values did not exceed the previous ranges. Here is where the problem began: while this dataset was similar in many ways, one key difference went unnoticed. It turns out that when you look at energy source from the perspective of end-use demand, the sources that most people typically think of as renewables — solar and wind — end up lumped in with electricity. (This is because we are measuring demand for energy from the electric grid as a whole). This meant that the already small renewables category from the production dataset became even smaller, effectively showing 0% change in demand for renewable energy over the next 25 years. While the visualization faithfully translates the data values, it lacks any visual clues that explain the discrepancy between what viewers expect and what they see in the visualization (“how can there be no change in Canada’s demand for renewables?”). This can be confusing for those who do not already understand the nuanced differences between energy production and demand.
So what’s the challenge here? The visualization design process is fundamentally based on the shape and semantics of the data. Any changes to data run the risk of invalidating design decisions that have already been made. As seen in our example, subtle changes to the semantics of data can easily go unnoticed and cause confusion. Furthermore, changes that introduce new outliers or new data distributions are notoriously challenging for visualization designers because they can completely reshape visualizations or render them illegible. Despite the deep impact that data changes can have, this impact often isn’t obvious to the people providing the data, and the changes themselves often aren’t obvious to the designers.
Going forward, better data characterization tools could help address these issues by assisting designers in identifying changes between old datasets and new ones. Simple tools that identify changes in data column names, values, and distributions could be a great first step. Going even further, tools that use current data to simulate possible future data updates could help designers consider designs that are more robust to future changes. For example, such tools could suggest several alternative quantitative data distributions, or they might simulate changes to the ways in which values are distributed across different categories. Changes to data semantics are harder to solve with automated tools and highlight why tight coordination between data providers and designers remains critical throughout the entire design process.
Data and interactions create unexpected edge cases
It can be challenging for designers to anticipate and test all possible combinations of interactive inputs that a visualization might receive. As a result, it can be hard to predict situations in which a chart design or data mapping may break.
For example, during the design of a visualization of energy imports and exports, our design team mocked up a mirrored bar chart that served as the central focus of the visualization. It showed average quarterly prices for electricity imports into Canada (orange, top) and exports to the United States (blue, bottom). The bar chart nested into a tight space between tile maps of the two countries, which viewers could use to filter the chart to show only particular states and provinces. The design team expected the bar chart to fit well within the allotted space based on mockups of the full dataset that were made using some basic charting tools.
The surprise came once the developers had a working implementation and the teams were able to test various combinations of filters and aggregation options. It turned out that the bar chart contained several outliers that extended well beyond the intended bounds of the bar chart. This happened because filtering can remove the most commonly occurring values, so the remaining values might have an average that far exceeds the expected range. In our example, the combinations of filters that led to this subset were too obscure for the design team to discover during basic data characterization. By the time anyone noticed this issue, changing the width and height of the bar chart would not have been possible. (Ultimately, we opted to use a scale break).
So what’s the challenge here? In interactive data visualizations, combinations of different data operators and filters can easily change the distribution of values that the visualizations need to accommodate. This can lead to edge cases that reveal themselves only after that portion of the visualization is already developed and hooked in to real data. In turn, this will often require additional redesigns.
This is especially challenging because current graphic design tools and interface design tools like Adobe Illustrator or Adobe XD don’t support the kind of data-driven sketching and prototyping necessary to identify these issues. On the other hand, testing these types of edge cases using a coded prototype effectively requires implementing the entire visualization! Relying on coded prototypes also sacrifices much of the expressiveness and efficiency that designers might gain from using interactive design tools. Fortunately, recent research projects like Data Illustrator, Charticulator, DataInk, and more suggest a way forward and hint at a future where designers might be able to quickly sketch and test these kinds of data-driven interactions using more expressive graphical tools.
Discrepancies between the design and development versions are hard to spot
Finally, we observed again and again just how difficult it can be to verify implemented versions of a visualization and determine that the data has been mapped correctly to the visuals. As a design is implemented, differences can emerge due to a variety of factors including bugs, data updates, inconsistencies in the initial designs, and misinterpretations of the data mapping. For example, while implementing a bubble chart, a member of the development team mistakenly coded the “size” of the circles by varying their diameter rather than their area.
Many would call this a “Visualization 101” mistake: sizing bubbles by diameter disproportionately exaggerates differences between values. However, the difference between area and diameter is quite subtle and difficult to spot in isolation. It took the other designers and developers several iterations to notice the issue.
Similarly, we have found that small details like the sizing and alignment of non-chart elements can be challenging to replicate consistently between initial design documents and final implemented visualizations. In interface design, communicating about such details is already supported within tools like Adobe XD or InVision. Such tools give developers a head start — by exporting accurate measurements and designed assets in formats that can be incorporated directly into final applications. In our design team’s experience, the lack of support for deeply data-driven designs within interface design tools had the team crafting most visualization mockups manually or through partial prototyping. Thus the visualization developers often had to re-implement interface elements entirely from scratch. This creates new opportunities for errors and inconsistencies that can be hard for either the designers or the developers to catch.
So what’s happening? Many details of visualization designs, including details of how data maps to visual elements and more standard interface design details, are difficult to communicate and even more difficult to verify. Without memorizing the underlying data behind the charts or performing exhaustive visual comparisons between mockups and final visualizations, it’s often only through luck that data mapping or interface implementation errors are caught. Semi-automated tools for visually comparing design documents and implemented visualizations could help catch many of these errors and are a promising area for future work. However, detecting differences and inconsistencies in interactions, animations, and other dynamic properties of visualizations remain a big challenge.
Forward visualization! (Design and the future of vis)
As a community — visualization researchers and practitioners alike — we can gain a lot from thinking about how to support growing visualization teams in which not all members are programming or visualization design experts. We need to be considering how deeply data is integrated into our visualizations and the processes that create them. How do we understand and convey the impact that a data update can have on our designs? How do we help designers more easily identify the combinations of interactions that can break their designs? And how do we ensure that our data has been mapped correctly when we aren’t the ones writing the code?
You may be thinking, why not simply require all visualization designers to have coding skills? This has certainly been the default option in the absence of appropriate tools. However, the benefit of bringing more diverse minds into the data visualization community is clear. Consider the explosion of creativity when journalists and data visualization specialists work together in newsrooms. Or consider the incredible appeal of tools like Tableau that support so many people in creating visualizations. Better visualization-specific design tools have the power to invite more people into the community, enabling their creativity and multiplying all of our efforts to make the world’s data more understandable.
Want to learn more?
Our IEEE VIS 2019 paper describes a few more challenges, including those of designers understanding developers’ technical constraints, prototyping data-dependent interactions, and formally communicating data mappings. A pre-print is freely available on arXiv. You can also learn more about the energy visualization project on our project page.