Toward Effective Data Journeys in Exploratory Visual Analysis

Introducing ChartSeer and other major solutions that assist with the exploratory visual analysis of data

Jian Zhao

Published in

Nightingale

7 min readDec 17, 2020

The ChartSeer system. — *ChartSeer: Interactive Steering Exploratory Visual Analysis with Machine Intelligence*

Imagine that you are facing an unfamiliar dataset or problem for which you have no goals or only vague hypotheses. To gauge more information and insights, you will need to conduct exploratory visual analysis (EVA) of your data. Similar to the concept of exploratory data analysis (EDA), EVA is an open-ended iterative process in which you identify interesting questions, inspect data with visualizations, and make informative decisions to continue the EVA. You are making a journey.

A child is wandering on white sand, leaving footprints. — Photo by Annie Spratt on Unsplash

However, this is not a smooth journey, because you are exploring the unknowns. You need to constantly determine promising branches of investigation within a large parameter space — which data variables to explore and which types of charts to use. Furthermore, you can easily get lost, failing to hold an overview of current analysis.

Let me give you a concrete scenario. Suppose that you are exploring a dataset of product purchase records, where each record contains 15 attributes (e.g. price, volume). You might have some vague hypotheses about pricing trends, so you create a number of two-variable charts (e.g. a line chart comparing prices over time). After making these charts, you become stuck as you do not know which attributes to explore despite the thousands of options. For example, two-variable charts have 15×14 unique variable combinations with five potential chart marks (e.g. circle), six potential encoding channels for variables (e.g. position, color), and numerous potential aggregation methods (e.g. binning). Complicating things further, with the current EVA tools (e.g. Jupyter Notebook), you do not have any knowledge about the interrelations between charts in this large analysis space or the variables that have already been explored. Moreover, in practice, the problems will further compound once you hand off your half-completed analysis to colleagues. Now, you need a guided journey.

Journeys end in lovers meeting.
— William Shakespeare, “Twelfth Night” (1601) act 2, sc. 3, l. [42]

There exists a number of academic works on facilitating an analyst’s data journeys. Here, I briefly introduce a few, but this is not an exhaustive list. In general, the approaches in these works either leverage empirical observations, heuristics, or rules to generate appropriate charts based on certain constraints (e.g. APT, Voyager, Draco), or employ machine learning and/or data-driven methods to train models in recommending meaningful charts from data (e.g. DeepEye, Data2Vis, VizML).

APT is one of the earliest systems, introducing a compositional algebra to enumerate the space of charts and rank them using expressiveness and effectiveness criteria, based on the study results in human perception. This set the foundation of many follow-up works, including the “ShowMe” feature in Tableau.

Voyager, built on CompassQL and Vega-Lite, searches and ranks charts using similar criteria. The system also supports browsing of charts and covers a broader search space, suggesting data variables to explore from a partial chart specification. Its later version, Voyager 2 supports using “wildcards” (i.e. “*” that allows analysts to specify multiple charts in parallel) for recommending relevant charts.

The Voyager system. — Voyager. Image source.

Draco is another chart recommendation tool that models visualization design knowledge as a collection of constraints and rules. By using ASP (Answer Set Programming), the tool describes the constraints made in analysts’ queries and searches “optimal” solutions.

The Draco system. — Draco. Image source.

The above rule-based approaches, however, are limited by the perceptual principles applied. They may face the cold start problem (i.e. no history to use for recommendations), costly creation of rules (for generating charts), and the combinatorial explosion of charts. With the recent advances in machine learning, several data-driven approaches have been proposed.

Data2Vis aims to build an end-to-end pipeline to generate charts directly from data. It employs sequence-to-sequence recurrent neural networks (RNN) to transform a data file in the JSON format to a chart in the Vega-Lite specification, by viewing the inputs and outputs as sequences of characters.

The Data2Vis system. — Data2Vis. Image source.

DeepEye combines rule-based methods with machine learning models to classify and rank visualizations. It leverages a binary classifier to determine whether a chart is good or bad, and uses a supervised learning-to-rank model to select appropriate charts, relying on expert knowledge for partial ordering.

VizML learns visualization design from a large corpus of dataset-chart pairs collected from the Plotly Community Feed. It extracts features from each dataset and five key design choices from each chart, and then optimizes the models to predict the design choices based on the features.

The VizML system. — VizML. Image source.

If you read this article until here, you already get a sense of the challenges in creating charts in EVA and the existing solutions. Have we got an effective guided journey? Partially, yes. The above systems help analysts navigate the large parameter space of charts by recommending or generating meaningful candidates, with certain constraints on the visual design or data variables. This significantly reduces your effort; for example, in deciding which data variables to explore and which types of charts to use. However, it is still easy to get lost due to the lack of a holistic view of the analysis landscape. Without this, you may give improper inputs to the recommendation engine, producing unwanted, noisy results that lead you down ineffective paths within EVA. It is like searching Google with inaccurate keywords. If you were to do so, the results would be less specific and your process, less effective.

But how do we provide a picture of the current EVA landscape and use that to inform future decisions, while offering on-demand recommendations? My colleagues and I have worked on addressing these problems, and recently proposed ChartSeer, a system that helps analysts steer their EVA through the dynamic visualization and recommendation of charts.

ChartSeer employs a meta-visualization approach to summarize the charts that have been created, allowing analysts to easily identify clusters, trends, and gaps in the current analysis. In the visual summary, each chart is represented as a circular glyph. The inner circle displays the mark type (e.g. “L” for a line chart) of the chart and the outer circle is a donut chart with each colored segment indicating a data variable in the chart. Moreover, gray bubbles with text annotations are used to show clusters of charts.

Visual summary of charts in ChartSeer. — Visual summary of charts with different layout parameters, from promoting more data variable similarity (left) to more visual encoding similarity (right).

An analyst could utilize this visual summary to gain knowledge about the current analysis landscape, and further obtain on-site chart recommendations by interacting with the system. They could peek into “holes” in the current analysis (e.g. by clicking on an empty space of the visual summary), so that appropriate charts will be automatically generated within local regions of interest.

Chart recommendation in ChartSeer. — The chart recommendation process in ChartSeer. (a) Sampling in the area specified by an analyst. (b) Reverse projection to high-dimensional vector space that characterizes the charts. (c) Decoding the vector to Vega-Lite representation. (d) kNN searching to obtain data variables in nearby charts.

To make the above functions possible, ChartSeer leverages state-of-the-art deep learning models, Grammar Variational Autoencoders (GVAE), to obtain a mapping between charts and vectors in a semantic space, and uses this mapping to create chart meta-visualizations and enable interactive recommendations. GVAE has an encoder-decoder architecture, where the encoder converts charts (in the Vega-Lite format) into numerical vectors and the decoder does the opposite. GVAE operates on parse trees governed by a context-free grammar (CFG), where the trees are the Vega-Lite specification of charts (i.e. a nested JSON). One benefit of the GVAE, compared to a traditional VAE, is that it is more likely to generate valid data samples with the decoder.

Based on the vectors generated by the encoder, chart similarity can be derived to compute relations, clusters, and distributions of user-created charts for visualizing the analysis landscape. Moreover, using the decoder, charts can be automatically generated as recommendations from the vectors in a region of interest specified by an analyst.

The GVAE model in ChartSeer. — ChartSeer characterizes data charts using encoding (i.e. e1, e2, e3) and decoding (i.e. d1, d2, d3) processes that are based on GVAE.

Overall, with the visual summary and interactive recommendation in ChartSeer, we could get an informative overview of our previous pathways and plan effectively for future steps with the help of machine learning. I hope you enjoy further exploring the challenges and the solutions to providing guided journeys in EVA.

The source code associated with ChartSeer is publicly available here, alongside its academic publication.

Dr. Jian Zhao is an Assistant Professor at the Cheriton School of Computer Science at the University of Waterloo. His research interests include Information Visualization, Human-Computer Interaction, Visual Analytics, and Data Science. His work contributes to the development of advanced interactive visualizations that promote the interplay of humans, machines, and data. For more information, please visit his personal website.

Toward Effective Data Journeys in Exploratory Visual Analysis

Introducing ChartSeer and other major solutions that assist with the exploratory visual analysis of data

Written by Jian Zhao