VizML: A Machine Learning Approach to Visualization Recommendation

This article summarizes a paper authored by Kevin Hu, Michiel A. Bakker, Stephen Li, Tim Kraska, and César Hidalgo. This paper will be presented at CHI 2019, a conference of Human-Computer Interaction, on Tuesday 7th May 2019 at 11:00 in the session Machine Learning and Visualization.

Data collection, preprocessing, and analysis flow.

Takeaway

We collect one million dataset-visualization pairs from Plotly to train neural networks that predict visualization design choices. These models perform on par with humans at predicting crowdsourced consensus visualization type.

Automating Data Visualization through Recommendation

Visualization is a powerful medium for exploring and communicating data. Whether they are embedded within interactive articles or assembled into dashboards, visualizations teach us about the world and help us make informed decisions.

But creating visualizations remains a highly manual task in a world where automation seems to be everywhere. This gap led us to ask: could we develop a system that takes a dataset then makes the design decisions that a human would make? Chooses to present the data as a bar, line, or scatter plot? Decides what to put on the x-axis vs. the y-axis?

Training Recommender Systems on Public Data

Visualization recommenders are already here. You may have encountered recommendation features like Show Me in Tableau or Recommended Charts in Excel that suggest different visualization types. You may have encountered standalone tools like Voyager and DIVE. No matter the form they take, rule-based systems encode guidelines like “use position to show differences in a quantitative variable” and “prefer bar charts to pie charts.”

Machine learning-based systems learn these guidelines directly from data. The issue is that, in order to train and evaluate machine learning models, you need data. Object recognition models learn from images and annotations. Language translation models learn from text in one language translated to text in another. Visualization recommender systems require datasets and corresponding design choices.

To gather these dataset-visualization pairs, we looked to Plotly, one of our favorite online visualization platforms. Plotly lets users create visualizations with a drag-and-drop chart builder and programming libraries, which are then published to the Community Feed. Using their API, we collected one million dataset-visualization pairs.

Screenshot of the Plotly Community Feed.

To describe the characteristics of a dataset, we extract features like the number of categorical fields and the average correlation between quantitative fields. We do the same for visualizations by extracting design choices made by the Plotly user. One example of a design choice is whether they chose a bar, line, or scatter chart. Another is whether they encode a column on the x-axis or the y-axis.

Performance of Learned Recommender Systems

We show that neural networks trained on dataset features predict visualization design choices with high accuracy. As shown in the table below, neural networks predict visualization type with a surprising 89.4% accuracy and axis encoding with 83.1% accuracy. Simpler models like logistic regression and random forests perform worse than neural networks but still far above random chance.

Visualization design choice prediction accuracies for the neural network and four baseline models.

While our models accurately predict the design choices of Plotly users, what if the Plotly users didn’t make the best choices? Assessing the effectiveness of a visualization is an open research question. One scalable method is to ask different people and then see if they agree. So we randomly selected 99 datasets, visualized each as a bar, line, and scatter plot, then computed the consensus of crowdsourced workers. When it comes to predicting this consensus, neural networks perform on par with humans and outperform other visualization recommenders.

The performance of our models shows that publicly available data provides the variety and quantity of training examples needed to train visualization recommenders at scale. Plotly dataset-visualization pairs are only the beginning. Imagine automated visualization systems trained on data from Many Eyes, Tableau Public, and domain-specific tools. These systems would enable rich interactions like a visualization autocomplete that interactively suggests design choices, or adaptive visualizations that change in response to different tasks and audiences.

No matter the form they take, we look forward to the day in which recommender systems enables all people with data — not just the few with technical backgrounds — to create visualizations.

Actionable Conclusions

  1. The Plotly Community Feed is a source of the dataset-visualization pairs needed to train visualization recommender systems at scale.
  2. Machine learning models trained to predict visualization design choices accurately predict design choices of Plotly users.
  3. Crowdsourced consensus is a means to evaluate the generalizability of recommender models.

Acknowledgments

Our research is supported in part by the MIT Media Lab consortium. We thank Alex Johnson for providing access to the Plotly API. We thank David Alvarez-Melis, Tommi Jaakkola, Çağatay Demiralp, Cristian Jara-Figueroa, Owais Khan, and Guru Mahendran for their feedback.

Further Reading

  1. “Agency plus automation” (paper) builds from research on intelligence augmentation to lay out a foundation for thinking and talking about automated visualization.
  2. Draco (paper, blog post) formalizes visualization design as solving constraints.
  3. DeepEye (paper) uses a rule-based recommender to generate visualizations that are classified as good or bad and ranked using models trained on human annotation.
  4. Data2Vis (paper, blog post) use a neural machine translation model to map directly from JSON-encoded datasets to Vega-Lite specifications.
  5. VizRec (paper) presents two frameworks for controlling the false discovery rate of visualization recommenders. is often part of an exploratory data analysis workflow.