Let’s Talk about Natural Language Interfaces for Data Visualization
TL;DR: A new suite of natural language-based visualization systems have begun to emerge. These systems allow users to directly ask questions about their data and assist users in answering those questions by suggesting or modifying visualizations. But what are the potential advantages of such systems? What makes them challenging to design and use?
In this post, I summarize recent research on natural language interfaces for data visualization addressing these questions. I also give my two cents on what may be interesting directions to explore going forward — discussing the potential role of system generated natural language and complementing visual data analysis with question answering.
Given the prevalence of smart speakers like Amazon Echo and voice-based assistants like Siri and Alexa in our everyday lives, interacting with devices using natural language has become quite the norm. With the increasing demand for data visualization tools and the varying levels of expertise of users of these tools, it comes as no surprise that both commercial vendors and academic researchers have begun to explore natural language interfaces (NLIs) for data analysis with visualization. While the idea of using language as a way to create visualizations was explored as long as 18 years ago, recent advances in natural language processing (NLP) coupled with the maturity of data visualization as a field have led to renewed interest in the topic in the last few years.
Prominent visualization software vendors including IBM, Microsoft, and most recently, Tableau, now offer NLIs that allow users to directly communicate their intended questions about the data using natural language. At a high-level, given a user question or command, these systems try to interpret the underlying intent (e.g., comparing trends, applying a filter) and parameters corresponding to those intents (e.g., data field names, data values) and map them to appropriate system actions such as creating new visualizations or updating an existing chart. An example of this can be seen in Figure 2 showing Tableau’s Ask Data system.
But before we get into the nitty-gritty of how these systems work, let’s take a step back and think about why we might want to use natural language as a way to interact with visualization systems in the first place.
Why natural language?
Systems like Tableau and Power BI are powerful visualization tools that allow people to interactively pose and answer their analytical questions. However, these tools often present a steep learning curve and require people to translate the questions in their minds into operations or actions supported by the tool. This “translation” of questions into tool-specific actions is something that many users, particularly novices, struggle with. Natural language can help overcome some of these issues by allowing users to express their questions in their own terms. A study by Tableau’s research team compared their initial prototype NLI to Tableau desktop and found that participants were faster with the NLI than Tableau for refining given visualizations. Both their study and a study we conducted with a speech- and touch-based network visualization tool also showed that participants preferred natural language when the standard interface would require them to perform multiple steps to complete a task, interrupting their analytic flow.
So, if natural language is an easier way to get started with a visualization tool and is also faster than the current style of input for certain scenarios, what prevents it from being the primary way we interact with visualization tools today?
What are the challenges in supporting natural language input? And what are visualization researchers doing to tackle these challenges?
Unsurprisingly, implementing NLIs is challenging since we need to build software that first interprets the human language as other humans would and then performs an appropriate set of actions based on its interpretation. Below I list a subset of challenges these systems pose from a design and usability standpoint. I also highlight how researchers are trying to address some of these challenges in the context of NLIs for data visualization.
Ambiguity and Underspecification: User questions are often ambiguous and underspecified. For example, imagine you were exploring an Olympic Games dataset with details about all medals won by different countries over several decades. While exploring this dataset with an NLI, you say “show me medals for hockey and skating by country.” This seemingly simple query presents multiple ambiguities. Specifically, the word “medals” can map to either the total number of medals or specific types of medals (e.g. gold). Similarly, “hockey” and “skating” can refer to different sports (e.g., ice hockey vs. field hockey or figure skating vs. speed skating). Even if we assume all of these ambiguities were resolved, there is still the question of which visualization to use (e.g. grouped vs. stacked bar chart).
One proposed solution to expose and help users resolve these ambiguities has been to use multimodal input (i.e., leverage another form of input such as a mouse). In this approach, given an input query, the system tries to identify ambiguous phrases using a suite of string matching and word association algorithms. These ambiguous phrases are then displayed through interactive widgets (referred to as ambiguity widgets) — allowing users to refine their queries and resolve the ambiguity. This idea was first presented and incorporated in a system called DataTone (Figure 3) by Adobe. An example of how the DataTone system handles the ambiguous query corresponding to the Olympic Games dataset can be seen in Figure 3.
An associated problem to ambiguity is that of underspecification. While ambiguity arises when the system has multiple options to consider, underspecification refers to cases where the input query lacks enough information (e.g., attributes, keywords that help infer intent) for the system to make a decision. People naturally tend to be imprecise when asking questions, frequently presenting incomplete or underspecified queries. For instance, when exploring a dataset about the population and associated economic metrics of different counties, one might wonder about the gross domestic product (GDP) of different counties and ask the question “What’s the correlation of GDP?” While this question may be clear to the user, from the system’s standpoint, it is both ambiguous (e.g. there might be different GDP columns, perhaps for multiple years) and underspecified (e.g. correlation of GDP against what?)
Addressing such issues, researchers at Tableau recently devised a set of techniques to infer missing details in a query based on the meaning and usage frequency of different data attributes as well as constraints imposed by the system’s supported operations. Using these inference techniques, given the underspecified query “What’s the correlation of GDP?”, the system can generate a scatterplot visualizing GDP per Capita and Life Expectancy since they were the most popular combination of attributes explored for the same dataset (Figure 4).
Preserving context to support an analytic flow: To support natural language input, it is not enough if a system allows users to enter one-off commands that result in a visualization. During visual data analysis, users often need to iterate upon their questions and refine existing visualizations — diving deeper into specific aspects of a chart or adding new visualizations to the current view. Supporting such actions implies that the system should be able to support a “conversation” between the user and the data. A key component of supporting a conversation is interpreting the context in which a query is posed. In other words, in addition to interpreting the current query, the system also needs to consider previously issued queries (to identify data attributes and values used) and the active view (e.g., visualizations, colors) so it can answer questions effectively. For instance, in Figure 2, if as a follow-up to the question “What is the profit for each state?” the user asked “What about different cities?”, the system needs to understand that the implicit attribute the user is referring to is “Profit” and accordingly adjust the choropleth map to color cities as opposed to states. As a first step towards supporting such interactions, research systems have employed conversational centering techniques from the field of pragmatics which is a subfield of linguistics focusing on the ways in which context contributes to meaning.
Consider the example in Figure 5 where a user is exploring a Seattle house price dataset. The user first says “houses less than 1M in Ballard” understanding which the system applies two filters for sales price and the neighborhood “Ballard”. Next, with the previous query in mind, the user simply says “townhomes” implying that the system must show townhomes in the Ballard neighborhood that cost less than 1M. To capture this implicit meaning, the system preserves (or technically, retains) the neighborhood and price filters and changes (or technically, shifts) the focus on all houses in the dataset to only townhomes. This combination of the retain and shift operations results in the system updating the chart as per the user’s expectations.
Discoverability: A key challenge faced by users of NLIs (particularly new users) is that they are unsure about what the system is capable of doing (i.e. which operations it can perform) and if the system expects the user to conform to a specific language structure. This uncertainty in knowing what can be asked and how is commonly referred to as the lack of discoverability. Although the advances in natural language understanding are allowing users to more freely phrase their intended operations, I would argue that discovering “what” can be done remains an open problem. Compared to the other challenges, discoverability has received relatively little attention in current visualization NLIs. The most common approach current systems take to aid discoverability is using an autocomplete feature. An example of this can be seen Microsoft’s Power BI (Figure 6). However, as found in a study, this approach gives users a false sense of the system’s ability to interpret more complex queries. Furthermore, this approach is limited to text input and does not work well for spoken commands.
A more recent approach to aid discoverability of natural language commands is using command suggestions. As shown in Figure 2, Tableau’s Ask Data system tries to help users start interacting with the system by suggesting sample commands. We also recently proposed a general framework to suggest natural language commands as a way to enhance the discoverability of natural language input in speech-based multimodal interfaces. While the idea of suggesting commands is simple at a surface-level, deciding which commands to show, and when and how the suggestions must be made are important design considerations that warrant additional research. Furthermore, the effects of such suggestions during visual data analysis (e.g. do they encourage/discourage users from thinking about new questions) are yet to be investigated. Given the increasing popularity of speech-based UIs, devising more ways to address the challenge of discoverability remains an open area for research.
Emerging Themes and Future Directions
Natural language as i̵n̵p̵u̵t̵ output: Until now, I have only discussed natural language as an input modality (i.e., as a way for users to communicate with the system). An emerging theme in visualization research is to complement visualizations with natural language and use language as an output modality (i.e., as a way for systems to communicate with users). This idea is being explored commercially by companies such as Narrative Science and Automated Insights that offer services to “summarize” visualizations in plain text. We also recently presented a system called Voder (see Figure 7) that automatically infers key data facts from a visualization and allows users to interact with these facts to highlight them in the visualization.
Current work on this topic is largely focusing on exploring the best way to identify key facts and creating natural language generation (NLG) models to present them in a communicable manner. However, with increasing concerns regarding the ethical dimensions of visualization research pertaining to automated analysis, I believe an interesting opportunity lies in leveraging NLG to communicate the logic behind the system’s actions so that it can build trust in the users’ mind and lead to more confident decision making. In my opinion, this is not only specific to NLIs but is an area where natural language research can contribute to data analytics and visualization tools in general.
Complementing Visual Data Analysis with Question Answering: While this post mostly exemplifies queries that create or modify visualizations, not all queries during data analysis may need a visualization as a response. For instance, looking at the map in Figure 2, one might ask “What were sales in California last year?” or “What are the total sales across regions?” In such cases, all the user wants is the value of sales. Although the system could generate visualizations in response to these questions, it is probably better if the system returned the actual value or answer and not just a chart that the user needs to interpret to get the answer.
Going forward, I hypothesize that to truly support a cycle of visual analysis through natural language, we need to design systems that can not only render a visualization that “contains” an answer but also provide direct responses when appropriate. Building such tools warrants further research at the intersection of data visualization systems that can identify the best chart in response to a question and question answering systems, or specifically, NLIs for databases that can directly compute and present values in response to user questions.
This post only briefly discusses some of the ongoing research on NLIs for data visualization. A more comprehensive review of a subset of the systems described in this post along with additional research opportunities can be found in our paper.
If you find this topic exciting, let’s keep talking (both to systems and each other). You can learn more about our work on natural language and multimodal interfaces for data analysis with visualization on the Information Interfaces Research Group project page or my personal website.