Evaluating Visualization Authoring Systems through Critical Reflections

This article is a group effort, with contributions from the authors of our InfoVis’19 paper.

TLDR: A new generation of visualization authoring systems has emerged in the past few years. Designed to support a common goal, these systems vary in terms of their visualization models, system architectures, and user interfaces. What are the strengths and weaknesses of these systems? How do we choose the right tools to build our visualization? We propose to use critical reflections as a method to compare these systems.

Chris is a visualization enthusiast, with a background in graphic design and some basic knowledge of data processing in Excel. An avid follower of numerous visualization design blogs and podcasts, Chris likes to apply the design knowledge and practices learned from these sources when making data visualizations. Template based tools like Excel charts are not flexible enough for novel visualization designs. Programming toolkits like D3 are powerful, but they also seem quite daunting to someone like Chris who has never coded before. Drawing tools like Adobe Illustrator are great for sketching, but lack functionality for encoding data as visual properties of graphical elements.

What Chris needs is a tool that provides the flexibility to go beyond chart templates, does not require programming expertise, yet supports automated mapping from data to visuals in a WYSIWYG (What You See Is What You Get) interface. Such a tool would be useful in many scenarios: sketching mockups for brainstorming design ideas, step-by-step explanation of how a visualization works, quick figure production for an article, to name a few. To address this need, a new generation of interactive visualization authoring systems has emerged over the past few years, which includes Lyra, Data Illustrator, Charticulator, Data-Driven Guides, DataInk, Datylon, and more.

At first glance, these systems appear quite similar: they all try to enable the creation of expressive visualizations in a graphical user interface without programming. Upon closer inspection and hands-on experimentation, however, these systems vary significantly in terms of functionality and user interface. For example, some provide a Pen Tool for drawing free-from shapes, which is absent in other systems; data binding is done through drag-and-drop in some systems, and through drop-down menus in others. As these systems were released, people on social media began to wonder about how they compare to each other. Andy Kirk, for example, called this an arms race of datavis tools, and later talked about these tools on a Data Stories podcast. It is our goal to shed some insights on how to judge this “arms race”.

Evaluating These Authoring Systems

How do these systems compare to each other? Which systems are better for which visualizations or tasks? These questions are important for users who are considering which system to choose. They are also of interest to researchers and developers who are trying to understand the strengths and weaknesses of these systems. Such understanding can inform the design and development of similar systems in the future.

Evaluating these complex systems is no easy task though. Several traditional evaluation methods are available, but each of them can only tell us part of the story. For example, many of the tools include an example gallery, showcasing the visualizations that can be created, but these examples tell us little about the user experience of interacting with these tools. Usability studies can reveal potential difficulties encountered by users, but these typically involve predefined authoring tasks that may not reflect what users do in the real-world. Longitudinal deployment can gather valuable feedback on how these systems are used in real-world settings over a period of time, but this approach requires a long time and an active user community.

Furthermore, these methods do not offer a direct comparison between these systems. We can, of course, design lab studies to compare a few systems by asking participants to perform a set of predefined tasks, but the outcomes of such studies can be potentially biased by a number of factors. First, it is hard to ensure comparable training since the amount and quality of training materials vary greatly among these systems. Second, the choice of tasks used in such studies will influence the fairness of comparisons.

Critical Reflections as an Evaluation Method

Given the limitations of the above evaluation methods, is there any other way to compare these systems? As the creators of three of these systems (Lyra, Data Illustrator, and Charticulator), we know every detail about our tools, better than anyone else. We were wondering about the feasibility of using critical reflections as an evaluation method. Critical reflections are modeled after design critiques — a popular process in the design community for analyzing how effectively design choices meet goals. Unlike critiques, which occur throughout an iterative design process, critical reflections are retrospective after the systems have been built and allow us to understand different trade-offs when making design choices in these systems.

The process of critical reflection went as follows: the creators of all three systems met weekly for 1 to 2 hour video conference meetings over the course of 3 months. During these meetings we directly compared these systems: commenting on our design and implementation, reflecting on practical feedback from the user community, and addressing missed or unexplored research directions. In critical reflections, a candid attitude is essential. We decided to leave no stone unturned by evaluating every possible visualization authoring task: mark instantiation and customization, glyph composition, path points and segments generation, linking glyphs, data scoping for glyphs, mapping data values to visual properties, scale management, axes and legend creation, layout items, nesting layout and applying coordinate systems.

To structure these conversations, we began by considering the eight evaluative dimensions proposed by Ren et al. We documented these discussions by recording meeting notes in a shared online folder. At times, comparing the systems required isolated, preliminary reflection on the individual authoring systems, which we documented in the shared folder. Each team carried out these isolated reflections as “take-home” tasks before the next weekly meeting. These isolated activities provided the necessary time to exhaustively consider the ways in which each system met or fell short of our defined objectives.

Expressivity and Learnability

Through this process, we collectively identified expressivity and learnability as two primary dimensions in the design of visualization authoring systems. Expressivity refers to the array of possible visualizations a tool can create, i.e., “Can I create the visualizations using this system?” Learnability refers to how easy it is to learn and use the system.

Comparison of the three systems begins at the level of system components along these two dimensions. We find instances where systems have incorporated each other’s approach (e.g., Charticulator offers two distinct data binding mechanisms drawn from Lyra and Data Illustrator, respectively) and areas of broad overlap (e.g., in the set of marks available for use), radically different approaches (e.g., the degree to which scales are exposed, and how complex layouts are interactively specified). These differences result in significant trade-offs between expressivity and learnability. For example, at one end of the spectrum, in Lyra, authors are able to manually construct a scale independent of any data binding interaction; on the other end, Charticulator does not distinguish a scale from its axis or legend; Data Illustrator lies in-between. When an author clicks the bind icon they are prompted to create a new scale, or reuse an existing scale if the field has been previously used. Such differences have clear implications for expressivity. In Data Illustrator, merging the scale functions enables authoring visualizations such as Gantt Charts. Authors using Lyra have greater control over scale functions, and reuse or merge scales in a more fine-grained fashion than in Data Illustrator. However, this expressivity comes with a non-trivial complexity cost. On-going feedback for Data Illustrator and Lyra indicate that authors struggle to understand the role that scales play.

Lyra’s scale listing and configuration panel (top-left), DataIllustrator’s direct manipulation controls for axes and legends (top-right), and Charticulator’s scale configuration panel (bottom).

A summary of the system components and the video demonstrations are available at https://vis-tools-reflections.github.io/

Author, Data, and Task: Shared Assumptions and Future Directions

Through these reflections, we also distill three fundamental assumptions that underlie and cut-across all three systems. In terms of the author, the systems assume that the users need to be comfortable with structuring input data and exhibiting a level of computational thinking. In addition, some systems assume familiarity with other systems. Given Data Illustrator and Charticulator’s ties to Adobe and Microsoft, respectively, they may attract authors who are already familiar with other systems produced by these organizations. Attaining a better understanding of skill transfer for learning our systems is an important direction for future research.

In terms of data, all three systems assume that the data is formatted as a stand-alone file, is static and relatively small (typically hundreds of rows). More importantly, the dataset needs to be formatted in a particular way that may not be clear to authors a priori. In particular, all three systems expect datasets to be structured in a long (often referred to as “tidy,” and as opposed to wide) format. These systems should consider adding higher-level scaffolding that automatically infers or suggests appropriate transformations when necessary, analogous to their existing mechanisms for data binding which automatically infer definitions for scales and guides. Alternatively, users may leverage other systems (e.g., Tableau Prep) that provide such capabilities already.

Finally, these systems are authoring tools, not design tools. In other words, if users want to explore the visualization design space, our systems are of little help as they begin with a blank canvas, a tabula rasa. Our systems are better suited for tasks where users have a specific chart design in mind. How might future versions of these systems allow authors to explore and combine divergent design ideas? Or how might our systems incorporate visualization design recommendation?

Conclusion

Analysis results from our critical reflections are available at https://vis-tools-reflections.github.io/

Using critical reflections as an evaluation method, we perform a thorough analysis of the components in three visualization authoring systems (Lyra, Data Illustrator and Charticulator). Potential users like Chris will be able to use the analysis results to inform their selection of authoring tools. Visualization researchers may also find critical reflections as a useful method in the evaluation of similar systems designed to support a common goal.

--

--

Zhicheng "Leo" Liu
Multiple Views: Visualization Research Explained

Research scientist at Adobe, interested in information visualization, visual analytics and cognitive science.