Insight Machines: The Past, Present, and Future of Visualization Recommendation
TLDR: Visualization recommendation systems suggest useful insights to help users more effectively explore and understand their data. In this blog post, we examine a brief history of why these systems were developed, and where we are today, and outline open-challenges for future research.
Imagine that you are given a standard enterprise dataset, with hundreds of attributes about customers, products, employees, sales information. How would you start to analyze this dataset? With the tools available today, you would most likely begin by generating individual visualizations corresponding to different parts of the data. You are likely to ask yourself questions like, what is the best way to generate the visualizations: Should the visualization be a bar or a line chart? Should this attribute be plotted on the x-axis or encoded as a color? How should the charts be laid out to facilitate comparisons between different values?
These questions are just a glimpse at the thousands of manual decisions that go into how analysts make sense of data through visualizations. During this painstaking and unguided process, it is often unclear what set of decisions will lead to a visualization exhibiting interesting insights. As a result, analysts can miss out on important insights hidden in their datasets. On top of that, this process is often overwhelming for analysts who are not familiar with their dataset or do not have sufficient visualization or data expertise to dig through the data themselves.
Wouldn’t it be great if there is an intelligent assistant that can guide us directly to relevant insights in the data?
To address the problem of information overload in exploratory analysis, visualization recommendation (VisRec) systems have been developed to automate certain aspects of visualization design. The goal of visualization recommendation systems is to assist users with the myriad choices that go into visual data exploration, such as “Which attributes should I visualize?”, “What type of visualization should I present my data with?”, “What types of data transformations do I need to apply on my data?”, and “How should I visually encode my data?”.
A Brief History of Visualization Recommendation
To understand where VisRec is today, it is useful to understand the landscape of systems that have been developed in the past. We have organized these related systems in the interactive timeline below. This timeline is by no means comprehensive, but is intended to provide a flavor of the capabilities supported by VisRec systems.
Our timeline is broken down into two tracks: data-based and encoding-based recommenders. Data-based recommenders focus on addressing the problem of selecting what data to visualize to discover interesting insights: What subsets of the data should I look at? What attribute or combination of attributes should I examine? Encoding-based recommenders answer the question of how to visualize the data of interest: Given a set of attributes of interest, how should I design and visually encode my data to generate a visualization? What visualization type and graphical marks should I select? What channels should I assign different attributes to?
Historically, encoding-based recommendation systems originated from the need to help people who are new to visualization more easily graph their data by selecting effective visualization design choices for them. Some of these systems leverage heuristics and rules from best practices in visualization design (e.g., ShowMe suggests that discrete line charts should be used if there are at least one date and one quantitative attribute selected). Others have used logic and constraints to more formally represent knowledge from visual perception experiments (APT). More recently, machine learning-based approaches (Draco, VizML) have also been used to infer appropriate visualization design choices.
On the other hand, data-based recommendation systems emerged from the challenges associated with exploring large multidimensional datasets. In particular, the task of navigating through the space of possible pairwise relationships between attributes can be overwhelming, as the number of possible pairwise relationships (displayed as 2D scatterplots) grows quadratically with the number of attributes present in the dataset. To address this problem, researchers developed a set of measures called scagnostics to characterize salient features of a scatterplot (e.g., outlying, clumpiness, monotonicity) as a proxy of how useful the scatterplot will be to the analyst. Subsequent VisRec systems (ScagExplorer, AutoVis) employed these metrics to effectively guide users towards interesting patterns in the space of possible scatterplots. In addition to looking at pairwise relationships between attributes, analysts often perform different filters on their data to compare how data patterns change across different subsets of data (e.g., what factors lead to products with very high return rate?). This space of possible data subsets (e.g., U.S.-based sales, sales for customers age 20–29, Chicago-based sales for books, non-U.S. sales for electronics that cost more than $500, etc.) is often much larger than the space of pairwise attribute combinations. As a result, VisRec systems have also investigated the problem of effectively exploring the space of data subsets (VisPilot, Discovery-Driven OLAP).
The Current State of Affairs
Given the promise of VisRec systems, where are we today? Do we have the magic bullet that can simplify and automate visual data exploration?
Unfortunately, VisRec systems similar to what we discussed earlier are still in its infancy. Beyond the ShowMe feature in Tableau and Explore in Google Sheets, VisRec systems remain unheard of from most practitioners, let alone be adopted in their day-to-day workflow. In this section, we discuss two key adoption challenges to VisRec, highlighting the lessons learned from past systems and remaining open questions in the area.
Current VisRec does not support the tremendous diversity in user’s analytical goals and intents.
Vartak et al. draw an analogy between VisRec and movie recommenders like Netflix, where users can select and browse movies of interest and the recommender accounts for a diverse set of factors, including content characteristics, task, browsing history, and user preferences. While VisRec systems have progressed towards a less myopic view of what to recommend, such systems are still far less mature than movie recommenders in supporting diverse user needs in practical applications (e.g., VisRec might suggest something that is statistically-interesting, but irrelevant to user’s interest and task goals).
What has worked to support diverse goals and intents?
By recognizing the diversity in analytic goals and tasks, both encoding and data recommenders have gradually shifted away from the assumption that there exists a single, best visualization waiting to be discovered and towards a more collective, multi-faceted notion of measuring the utility of visualizations based on user intent, goals, and the sequence of past actions.
Encoding-based recommendations have traditionally focused on automating the design of a single visualization given a complete set of user specifications. In a study on how novices construct visualizations, Grammel et al. found that many participants had difficulties translating their questions into effective visual mappings. They naturally omitted certain aspects of the visualization specification, such as what operations to perform on the data or how the data should be visualized, when conveying their visualization intents. Due to the ambiguous, incomplete nature of conversational utterances, partial specification is supported in natural-language visualization interfaces (DataTone, Eviza, ShapeSearch), as well as recent VisRec systems to allow users to partially specify the attributes and filters of interest. For example, Small Multiples Large Singles and Voyager offer different collections of possible alternative visualizations (e.g., adding an attribute, changing axes variables). Moreover, Behavior-Driven VisRec and Dziban leverage implicit signals from past interactions to ensure that the recommended visualization is consistent with the intended analytic goals. For example, Dziban recommends visualizations that are similar to a prior visualization that the user has seen.
Similarly, in data-based recommenders (i.e., systems that suggest what data to look at), we see a divide between systems that cater to providing a single type of insight for a narrowly-defined analytic task versus systems that showcase a more diverse “bag” of insights. Single-task systems, including Discovery-Driven OLAP, Automatic Selection of Partitioning Variable, SeeDB, Zenvisage, VisPilot, are designed for a single type of analytical tasks, such as finding the most interesting variable to partition a scatterplot by or finding the most interesting subsets to filter on. On the other hand, multi-task recommenders are often based on simple, standard statistical metrics for defining interestingness (such as outlier and correlation). The contribution of these systems is often not the interestingness metrics themselves, but the interaction and capabilities around the generation, searching, and browsing of insights. Example systems include Rank-by-Feature, ScagExplorer, AutoVis, VizDeck, Foresight, DataSite, DIVE, where the bags of insights are displayed as an automatically-generated dashboard.
What’s next for supporting diverse goals and intents?
Supporting diverse user intent and information needs in VisRec systems is a non-trivial problem, as there is often a tradeoff between the “expressiveness”, or flexibility, of an interface and its usability (e.g., more options → increasing interface complexity). The challenge is to design natural interfaces that seamlessly adapt to user behavior and proactively anticipate their information needs.
Continuing our enterprise example earlier, an HR manager might be only interested in information related to employees and their performance. So while an anomalous trend in the sales of a particular product last month might be of interest to someone in Marketing, it would not be interesting to the HR manager. Even with the same dataset, users often have different questions they are interested in. How can we design VisRec systems to learn user intent? While there is some past work on learning-based interaction history, how do we develop personalized VisRec that models a more nuanced and complex set of user preferences, behavior, and characteristics?
Another open question is how to design expressive languages to enable users to steer the recommendations? Let’s say that our HR manager knows that the performance of employees in the New York branch is evaluated based on a more reliable set of metrics. How can the user effectively communicate such pre-existing or domain knowledge to the system to facilitate a more productive interaction, even when there might not be an immediate inquiry in mind?
VisRec fails to provide users with a sense of agency, trust, and safety necessary for data-driven decision making.
Recent studies of user experiences in VisRec indicate that visualizations recommended by current tools are often simply nice-to-haves, but largely lack the executive strength for practical use in decision making compared to manual approaches. This is partly due to the fact that existing VisRec systems do not provide a sense of validation or guarantee, nor an effective mechanism to communicate the rationale behind the recommendations with the human analyst. For example, current VisRec systems return a list of suggested visualizations without conveying what is the space of visualizations it has searched through, what are the portions of this space that it does not cover, and why these recommendations were selected.
What has worked to provide agency, trust, and safety?
Previous systems vary in how much manual input they ask a user to provide to drive the recommendations. In the timeline, we saw some systems taking a completely manual approach to VisRec, requiring users to specify everything except the aspect to be recommended. We have also seen examples of fully-automated systems that operated on the dataset without any user input. Recently trends in VisRec have converged upon adopting mixed-initiative interactions, in which the user’s current state is based on the data fields they have selected. This current state, in turn, drives the recommendations that are displayed, thus providing a steerable mechanism for users to browse relevant recommendations. Instead of simply leaving the users with an arbitrary, automatically-generated bag of statistical insights, the mixed-initiative approach encourages an iterative dialogue between the user and the VisRec system.
Alongside development in providing users with more agency through interaction, there have also been developments in formalizing the grammar and language for VisRec. Heer explains that such a domain-specific language “provides a shared medium in which both people and machines reason about and formulate actions”. Examples include CompassQL for Voyager and ZQL for Zenvisage. These query languages are built on top of visualization specification languages, such as Vega-Lite and Grammar of Graphics, which allows users to fully specify a visualization (e.g., encode a bar chart with “Average Sales” as the y-axis and “Product Type” as the x-axis). VisRec query languages allow for partial specification of visualization intent (e.g., Show me any visualization involving “Average Sales”).
What’s next for providing agency, trust, and safety?
While the field has made strides towards a specification mechanism that supports greater user control, there remain a number of open challenges in providing users with a sense of agency, trust, and safety over the results from VisRec systems.
In his CHI 2019 paper, Correll cautions against using VisRec tools as “p-hacking machines”. By this, he means that the relative ease of obtaining insights that come with VisRec tools, especially for users with minimal statistical literacy, may inadvertently lead users to draw biased and false conclusions about their data. In other words, when a user doesn’t have to think as hard about what they want to see or how, there is a danger that their conclusions about the data will also be less thoughtful. In addition, users may be unaware of the thousands of analysis paths that the VisRec tool may have searched through in coming up with the recommended insight. The more comparisons a statistical process makes, the more likely it is to produce spurious results that look interesting simply by chance. This can lead to unreliable and misinformed interpretations of such insights. Initial work in designing interventions to tackle the dangers of VisRec include discounting the false discovery rates from testing multiple hypotheses via a correction constant, and ensuring that users do not miss out on important context for comparison during drill-down data exploration (VisPilot).
There remain plenty of problems in providing trust and safety guarantees in VisRec. For example, a skeptical analyst might wonder: How do I know that the insights provided by VisRec are “complete” (i.e., covering all the things that I can learn from the dataset)? What is the coverage of the types of insights that could be recommended and what additional analysis would I have to do on my own? Can I trust that the recommender is not missing out on any insights that I actually care about? From a system designer's perspective, the challenge is how we can convey this notion of a guarantee, without requiring users to know every intricate detail of how the VisRec tool works.
The design of such safety mechanisms is critically tied to the more philosophical question of what is VisRec and what it aims to provide: Should VisRec be regarded as a vitamin— supplementing the user’s exploratory analysis with additional insights on the side, with a minimal guarantee on the final outcomes of the analysis? Or should it be treated more like a drug — with a set of rigorously-tested guarantees towards confirmatory insights, at the cost of a greater potential for incurring harm?
Another approach to promoting trust and safety in VisRec is through explanations. Most explanations in VisRec tools, if any, are limited to templated textual explanations of the visualization via Natural-language generation (NLG), e.g., “This line chart shows that the average sales in Pacific Northwest have increased 40% in the month of August”. They are “local” in the sense that they are tied to individual recommendation instances, rather than revealing a more “global”, big-picture view of the model behavior. VisRec systems should consider leveraging the complementary strength of global and local explanations to provide not just a description of what the recommendation is but also what the VisRec is doing and what the VisRec is capable of doing.
A Future for VisRec
With the advent of data-driven decision making, we are no longer bottlenecked simply by the volume of data for processing, but more often limited by how much information can be consumed by a human analyst. Visualization recommendation systems offer a silver lining to this problem by suggesting potentially-insightful visualizations to display to the analyst. The many examples described above illustrate the immense diversity in VisRec tools and their capabilities, spanning research from visualization, databases, and HCI.
Challenges remain around designing a better user experience on how to help users manage, navigate, and make sense of this large space of visualization alternatives. Going forward, VisRec systems need to become more receptive to diverse and often ill-defined user intents and goals. Given the potential dangers of VisRec and its role in democratizing data science, VisRec systems bear some responsibility to inform and educate users through explanations, as well as provide users with the tools to make well-informed, actionable decisions with their data. These upcoming developments bring us towards a constructive and sustainable future in human-machine collaboration for data science.
Thanks to Jessica Hullman and Aditya Parameswaran for feedback on this piece.