What the Chart?
Measuring visualization recommendation quality.
Creating new visualizations has become convenient enough that anyone with a computer and access to the internet can generate them thanks to a wide array of recommendation algorithms. However, the majority of users use these visualizations without considering whether they are better compared to the visualizations recommended by other algorithms.
There are a few existing frameworks that focus on generating new visualization recommendation algorithms without comparing them hence making it hard to find the difference in performance. Therefore to compare these algorithms, researchers in a 2021 study designed a new standardized framework. It compares algorithms based on how they impact the user’s performance during visual analysis and has three components:
- A network consisting of visualizations as nodes and different design features as edges all laid out in design space.
- A method used by a recommendation algorithm to identify the candidate visualizations.
- A method used to rank the visualizations that have been selected. This method is called an oracle.
Different Components of the Framework
The first component of the framework is the design space. For this framework, the researchers considered the full design space of all possible visualizations. Full design space is a combination of attributes, encoding, and transformation that can be applied to a given dataset. This space is used to compare individual algorithms.
In the graph of the design space, the nodes represented individual visualizations and the edges represented the operations that transformed one node into another. A few instances of these operations could be adding one attribute or changing the data transformation or encoding channel of the attribute. To traverse the visualization space more effectively the researchers defined Sub-Spaces where multiple visualizations were merged into one node thus reducing the total edges that had to be traversed.
The second component consists of the Enumeration process, in which search algorithms traverse the space to identify the candidates for the best visual. These candidates are selected based on input requirements and are next passed to the ranking step. This process necessitates that the algorithms take care of the trade-off between the extent of enumeration and the execution cost since higher quality results can be achieved by enumerating and ranking more candidates, but this added work increases the execution time. To take care of the trade-off, constraints are added for the node selection process limiting the parts of the design space that can be traversed. Next one of the three different approaches to enumeration processes is used to select the candidate nodes.
The last component consists of the ranking process, where the candidates chosen by the enumeration process are ordered in terms of how closely they match the predefined filters or criteria. The criterion considers the quality and relevance of candidate visualizations. The ranking part of the algorithm that uses this criterion is referred to as oracle. It relies on the user’s recent history of visualizations created and interactions performed along with the statistics of the dataset to assign ranks. To rank candidates, it uses 3 kinds of models: behavioral models, statistical models, and machine learning models. These models are created by using field observations, aggregate statistics, and machine learning either individually or in combinations.
Final Product
Using this new standardized framework several preexisting recommendation algorithms were compared based on different facets one of which is the ranking trade-off which was mentioned earlier. After comparing, these algorithms were benchmarked or ranked using a reference point.
To standardize the benchmarking process the researchers created an interface to go over the algorithms and increase the ease of use. After this, a remote study was conducted with 72 participants where they were separated into 9 groups to test 8 different combinations of recommendation algorithms and datasets. Based on their experience using the interface they were asked to fill out post-task questionnaires.
Conclusion
In the conclusion of the study, it was discovered that there was no significant difference in participants' performance when specific tasks were considered. In addition, the benchmark results implied that analysts preferred different algorithms for different analysis tasks. For future work, the researchers found that participants wanted to see an interface with better functionality, like filtering and aggregation support.
References
- Zehua Zeng, Phoebe Moh, Fan Du, Jane Hoffswell, Tak Yeon Lee, Sana Malik, Eunyee Koh, Leilani Battle. An Evaluation-Focused Framework for Visualization Recommendation Algorithms. IEEE VIS 2021.