Verbalizing Vertices

Turning data in scatterplots into spoken words.

Sagar Chand
VisUMD
4 min readDec 14, 2022

--

Image by MidJourney (v4).

hat if we could use spoken words to describe data? A recent article by Henkin and Turkay discuss how to improve the efficiency and understanding of visualizations using descriptions of visualizations as words to fuel the development of utilizing visualization tools that use natural language processing to develop the visualizations.

The article consists of two studies: the first is where participants are shown a series of scatterplots and asked to verbalize the relationship in the scatter plot. In the second study the researchers focus on developing a description for each scatter plot using the findings of the first study, they then ask the participants to match a randomized list of scatter plots to the descriptions of the scatter plot. As this is still not a heavily researched topic within the field, the researchers were interested in bridging this gap of understanding so that they are able to take the next step in evolving the software and tools we utilize. The results of these studies also help data scientists in developing better visualizations and descriptions for the visualizations as they would get an insight into how the population interprets the relationships in a scatter plot.

The goal of this research is to establish empirical evidence that can inform decisions in designing and developing multimodal data analysis approaches. Based on this they had three goals: gaining an overview of the language constructs and the vocabulary used to describe the visualizations (G1), identifying the characteristics of the data and the visualizations that participants refer to (G2), and understanding the relationship between these characteristics and the language observed in the studies (G3).

After pre-processing the data, conducting a syntactic analysis, and summarization and collation analysis they were finally able to extract important insights from the data. They discovered that the verbalization of the scatterplots yielded two categories of answers; concepts, and traits. Table 1 below shows the concepts they extracted through the verbalization and table 2 shows the traits that they were able to extract from the verbalization of the scatterplots.

Table 1: Definition, words and examples of answers of the concepts identified through the analysis of results.
Table 2: Definition, words and examples of answers for the traits identified through the analysis of results

Based on these tables, they found some clear trends that are forming for each trait and concept as the scatterplots vary from -0.8 to 0.8. This is quite useful as it shows that there was a general use of words that described specific relationships in a scatter plot. However, during their research they discovered that there were frequent occurrences of collocations, which are word combinations that occur frequently within the data. Based on the collocations they plotted the concepts against the traits to see if there are patterns or trends within them. The researchers found a pattern of how the population interprets the visualizations which accomplishes one of their goals.

In the second study, the researchers used their findings in the first study to develop descriptions of various scatter plots using the concepts and traits. The second study worked as a great tool to justify and reconfirm the findings in the first study. Since the experiments were so similar they conducted a comparative analysis to see how the interpretation of concepts and traits vary between visualization to verbalization and verbalization to visualization settings. even though each study focused on a different aspect (verbalization or visualization) the use of words that described the visualization was very similar and therefore the second study was able to reaffirm the findings of the first study.

In conclusion, this research study aimed at developing an understanding of which characteristics of data and charts are relevant when people interpret data visualizations. Through the visualization to verbalization study, they were able to develop a taxonomy and the associated responses to the categories. Their understanding and organization of concepts and traits to develop verbalized descriptions also resonated with the users in the verbalization-to-visualization study. They were able to achieve their goals in both studies and were able to confirm their findings thanks to similar results from both studies. They also suggested a variety of practical implications of this research paper that discusses the future improvement of technology, software, and tools and how this study can build toward that. As I mentioned in the introduction, with new technology, software, and tools you are able to develop a better understanding of what you can and cannot do which in turn fuels the evolution of the technology.

Citation

  • Henkin, Rafael, and Cagatay Turkay. “Words of Estimative Correlation: Studying Verbalizations of Scatterplots.” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 4, 2022, pp. 1967–1981., https://doi.org/10.1109/tvcg.2020.3023537.

--

--