VizNet: Towards a Large-Scale Visualization Learning and Benchmarking Repository

Published in

ACM CHI

4 min readMay 6, 2019

This article summarizes a paper authored by Kevin Hu, Snehalkumar ‘Neil’ S. Gaikwad, Madelon Hulsebos, Michiel A. Bakker, Emanuel Zgraggen, César Hidalgo, Tim Kraska, Guoliang Li, Arvind Satyanarayan, and Çağatay Demiralp. This paper will be presented at CHI 2019 on Tuesday 7th May 2019 at 16:00 in the session Visualization Systems and Repositories.

*VizNet enables data scientists and visualization researchers to aggregate data, enumerate visual encodings, and crowdsource effectiveness evaluation metrics.*

Takeaway

VizNet is a large-scale corpus of over 31 million datasets compiled from the web, open data repositories, and online visualization platforms. Researchers can use VizNet to conduct experiments with real-world data, assess the ecological validity of synthetic data, and compare design techniques against a common baseline.

The Need for Visualization Repositories

Large-scale databases such as WordNet [1] and ImageNet [2] provide the data needed to train and test machine learning models, as well as a common baseline for evaluation, experimentation, and benchmarking. They have proven instrumental in pushing the state-of-the-art forward in language modeling and computer vision.

Research on graphical perception, however, often relies on ad hoc or synthetically generated datasets that do not display the same characteristics as data found in the wild. To date, insufficient attention has been paid to design and engineer a centralized and large-scale repository for evaluating the effectiveness of visual designs. This heightens the need for building a large scale corpus to learn, evaluate, and benchmark various measures of perceptual effectiveness.

Characterizing Real-World Data

We introduce VizNet, a large-scale corpus of over 31 million datasets compiled from the web, open data repositories, and online visualization platforms.

We find that real-world datasets typically consist of 17 rows and 3 columns. 51% of the columns in the corpus are categorical data, 44% quantitative, and only 5% temporal. About half of the columns are best described by a normal, lognormal, or power law distribution. Summary statistics and distributions (bottom) are shown below.

Summary statistics (top) and distributions (bottom) of the four source corpora and the VizNet 1M corpus. In the top table, we report the median number of rows and columns. The Distribution column includes the top three most frequent column distributions. Distributions are abbreviated as Norm = normal, L-N = log-normal, Pow = power law, Exp = exponential, Unif = uniform, and Und = undefined. The bottom part of the figure contains distributions describing columns, datasets, and the entire corpus. The bars outlined in red represent three column datasets and the subset which contain one categorical and two quantitative fields.

Utility of VizNet as a resource for data scientists and visualization researchers

We demonstrate VizNet’s viability as a platform for conducting online crowdsourced experiments at scale by replicating the Kim and Heer (2018) study assessing the effect of task and data distribution on the effectiveness of visual encodings [3], and extend it with an additional task: outlier detection.

Experiment interface for the Compare Values task. Following Kim and Heer (2018), we considered 4 visualization tasks informed by the Amar et al. (2005) taxonomy of low-level analytic activities. Two of those tasks were value tasks: Read Value and Compare Values asked users to read and compare individual values. The other two tasks were summary tasks: Find Maximum and Compare Averages required the identification or comparison of aggregate properties. Each of these tasks was formulated as a binary question (two-alternative forced choice questions). We generated the two alternatives using the procedure described in the prior study.

While largely in line with the original findings, our results do exhibit several statistically significant differences as a result of our more diverse backing datasets. These differences inform our discussion on how crowdsourced graphical perception studies must adapt to and account for the variation found in organic datasets.

As the VizNet corpus grows, assessing the effectiveness of these (data, visualization, task) triplets, even using crowdsourcing, will quickly become time- and cost-prohibitive. To contend with this scale, we conclude by formulating effectiveness prediction as a machine learning task over these triplets. Our results suggest that machine learning offers a promising method for efficiently annotating VizNet content.

Conclusions

VizNet provides the common baseline for comparing visualization design techniques and developing benchmark models and algorithms for studying graphical perception at scale.
We demonstrate how machine learning models can offer a promising method for efficiently annotating (data, visualization, task) triplets at scale.
VizNet research provides an important direction to understand the opportunities and challenges faced in replicating prior work in human-computer interaction and visualization research.

Acknowledgments

We thank Alex Johnson for providing access to the Plotly API, Robert Kosara for providing the Many Eyes data, and the authors of [4] for scraping and providing access to open data repositories.

References

[1] George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39–41.

[2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR.

[3] Younghoon Kim and Jeffrey Heer. 2018. Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings. Computer Graphics Forum (Proc. EuroVis) (2018).

[4] Sebastian Neumaier, Jürgen Umbrich, and Axel Polleres. 2016. Automated Quality Assessment of Metadata across Open Data Portals. Journal of Data and Information Quality (2016).