Visualization in Deep Learning
How interactive interfaces and visualizations help people use and understand neural networks
TL;DR: The democratization of AI is either near or already here—the barrier to developing and deploying neural networks is lower than ever before. But complex deep learning models are hard to train and hard to understand. Interactive interfaces and visualizations have been designed and developed to help people understand what models have learned and how they make predictions. In this post, we survey visualization for deep learning and list takeaways for this emerging field.
Whereas most machine learning approaches use a dataset with known features—for example, a collection of cars (dataset) with known makes, models, and colors (features)—deep learning is a specific set of techniques that when given a dataset, learns what features are important to the task. This is useful for datasets without explicit tabular features, like an album of images, a portfolio of text documents, or an audio library. The name deep learning stems from the go-to model architecture for these types of models: deep artificial neural networks.
First mentioned as early as the 1940s, artificial neural networks have a rich history and have recently seen a dominant resurgence in many research areas by producing state-of-the-art results on a number of diverse big data tasks. For example, premier machine learning (ML), deep learning, and artificial intelligence (AI) conferences have seen enormous growth in attendance and paper submissions since the early 2010s. Furthermore, open-source toolkits and programming libraries for building, training, and evaluating deep neural networks have become more robust and easy to use. As a result, the barrier to developing deep learning models is lower than ever before, and its influence on other domains has become pervasive.
So, what’s the problem?
While modern machine learning progress is impressive, it isn’t with unique challenges. For example, the lack of interpretability and transparency of neural networks, from the learned features to the underlying decision processes, is an important problem to address. Making sense of why a particular model misclassifies data or behaves poorly can be challenging for model developers. Similarly, end-users interacting with an application that relies on deep learning to make decisions may question its reliability if no explanation is given by the model, or may become confused if the explanation is convoluted.
While explaining neural network decisions is important, there are numerous other problems that arise from deep learning, such as AI safety and security (e.g., when using models that could affect a person’s social, financial, or legal wellbeing), and compromised trust due to bias in models and datasets, just to name a few. These challenges are often exacerbated due to the large datasets required to train most deep learning models. As worrisome as these problems are, they will likely become even more widespread as more AI-powered systems are deployed in the world. Therefore, a general sense of model understanding is not only beneficial, but often required to address the aforementioned issues.
Can data visualization help?
Data visualization and visual analytics excel at communicating information and discovering insights by using visual encodings to transform abstract data into meaningful representations. For visualization in deep learning, in the seminal work by Zeiler and Fergus, a technique called deconvolutional networks enabled projection from a model’s learned feature space back to the pixel space, or in other words, gave us a glimpse at what neural networks were seeing in large sets of images. Their technique and results give insight into what types of features deep neural networks are learning at specific layers in the model, and provide a debugging mechanism for improving a model. This work is often credited for popularizing visualization in the deep learning and computer vision communities in recent years, highlighting visualization as a powerful tool that helps people understand and improve deep models.
However, visualization research for neural networks started well before. Now, over just a handful of years, many different techniques have been introduced to help interpret what neural networks are learning. For example, many techniques generate static visualizations indicating which parts of an image are most important to a model’s classification. However, interaction has also been incorporated into visual analytics tools to help people understand a model’s decision process. This hybrid research area has grown in both academia and industry, forming the basis for new research and communities that wish to explain models clearly.
An interrogative survey of visual analytics in deep learning
To help make sense of the role of visualization in deep learning, we conducted a survey using a human-centered, interrogative framework. This method enables us to position research with respect to its Five W’s and How (why, who, what, how, when, and where)—a framework based on how we familiarize ourselves with new topics in everyday settings—and helps us quickly grasp important facets of this young and growing body of research.
- Why would one want to use visualization in deep learning?
- Who would use and benefit from visualizing deep learning?
- What data, features, and relationships in deep learning can be visualized?
- How can we visualize deep learning data, features, and relationships?
- When in the deep learning process is visualization used?
- Where has deep learning visualization been used?
This framing captures the needs, audience, and techniques of deep learning visualization, positions new work in the context of existing literature, and helps us reveal and organize the various facets of deep learning visualization research and their related topics.
Note our derived categories are not exhaustive, but rather what we observe in literature today.
The table with links to each visualization system surveyed can be found here: http://bit.ly/va-dl-survey.
We hope that this survey acts as a companion text for researchers and practitioners wishing to understand how visualization supports deep learning research and applications.
8 Survey Takeaways
During our presentation of this survey at the 2018 IEEE Visualization conference, we presented the following 8 takeaways:
1. Interpretability is the most popular reason for deep learning visualization.
Nearly all visual analytics systems support some notion of model interpretability. However, given this near-unanimous attention, a formal agreed-upon definition for interpretability remains elusive. Literature agrees that it refers to a human understanding, but a human understanding of what: model internals, model operations, mapping of data, or learned representation? It could be that interpretability never achieves a specific definition, but instead becomes an umbrella term—a suite of explanation techniques and conditions to satisfy to ensure the fair, accountable, and transparent use of a deep learning model.
2. Most tools are aimed at expert users.
By expert users, we typically mean machine learning model developers and others who often build, train, and ultimately iterate upon models to improve performance. Currently, there is approximately 3x as much visualization work for experts users compared to non-expert users.
The few non-expert tools are often positioned as educational tools to help people learning deep learning concepts. This skewed emphasis on expert users leaves significant opportunity for designing and developing tools, techniques, and explanations for non-expert users that are often end-users of ML-powered technologies, systems, and products.
There’s also a growing trend of using explorable explanations— interactive articles that use explanatory text alongside interactive graphics, visualizations, simulations, and models to explain concepts via active learning—to educate broad audiences about machine learning. The VISxAI workshop encouraged visualization researchers to build explorables to help communicate current research progress and create visual narratives to bring new insight into the often obfuscated complexity of machine learning systems. Excellent explorable examples include visualizing fairness concerns in machine learning, understanding dimensionality reduction, explaining how neural networks are trained, and visualizing fundamental techniques in machine learning.
3. Most tools use instance-based analysis techniques.
Currently, neural networks lack natural and effective global explanations: explanations that roughly capture and explain the entire space learned by a model, favoring simplicity over completeness. Most visual analytics systems for deep learning use instance-based methods, in other words, observing the input-output relationship of known data points, to create local explanations that accurately explain a single data point’s prediction. In general, instance-based analysis is a common technique used in the broader machine learning community to test and debug models, where experts often use a curated set of known instances.
4. Deep learning visualization is a hybrid and fast-paced community.
Visual analytics systems targetted at supporting some deep learning task are published in the usual academic visualization and human-computer interaction conferences (TVCG, VIS, CHI), but others also appear in machine learning focused venues (NeurIPS, ICML, ICLR, CVPR), and many specialized workshops at both HCI and ML venues. The number of visualizations for deep learning has also increased quickly over only the past 5 years. Lastly, it should be noted that a handful of these visualization systems are open-sourced too.
5. Furthering interpretability requires collaboration between multiple communities.
Besides data visualization explanations for models, AI and ML researchers have created a number of algorithmic-based explanations (e.g., attention, saliency, and feature visualization), but these methods are often static and studied in isolation. This presents new opportunities for HCI and visualization researchers to combine these explanations within interactive interfaces to help people understand models better and faster. Distill, an online journal dedicated to clear explanations of machine learning, has recognized this problem and begun to build interfaces that use these techniques to show how neural networks “see” the world.
6. Interactive deep learning lacks actionability.
Interactive machine learning systems leverage a human-in-the-loop paradigm, where humans and machines work together in an iterative process to solve a task; for example, building a performant, non-biased model. More specifically, after producing a visualization, a user may change specific model hyperparameters or perform better feature engineering based on the visualization. However, given the current deep learning visualizations, actionability is not immediately clear and guidelines to iterate on models are not solidified. Does the model need to be retrained with different hyperparameters? Does the model lack sufficient data?
7. Evaluating model explanations is hard.
Evaluating visualizations is a hard problem and an open, vibrant area of research. But when the object to visualize is now probabilistic, evaluation becomes even harder. It has been shown that qualitative evaluations of explanations for image-based models can be misleading (e.g., a simple “eye test”), therefore it is important researchers remain rigorous in evaluating new explanations and interpretable systems. Initial work has proposed a taxonomy of evaluation approaches to help other researchers determine how to evaluate interpretable systems.
8. State-of-the-art models are not robust.
Adversarial machine learning has shown the sensitivity of both image- and audio-based deep learning models. Models fail on attacked data instances that appear benign to the human eye or ear. Visualization could help identify when data and models are attacked, and how well defenses can protect against intended attacks (e.g., adversary wants to crash a self-driving car) and unintended attacks (e.g., incorrect facial recognition for people’s faces on bus ads or billboards).
These are only a few of the highlights from our survey. Many more can be found in the research paper. We have created a website that contains our paper, a teaser video, and the table listing the surveyed systems, their categorization, and their corresponding project/paper links accessible at http://bit.ly/va-dl-survey.
Fred Hohman (@fredhohman) is a PhD student at Georgia Tech.
Minsuk Kahng (@minsukkahng) is a PhD candidate at Georgia Tech.
Robert Pienta is a Research Scientist at Symantec.
Polo Chau (@polochau) is an Associate Professor at Georgia Tech.
We thank the Georgia Tech Visualization Lab for feedback on this post.
This work was supported by a NASA Space Technology Research Fellowship, a Google PhD Fellowship, and NSF grants CNS-1704701, TWC-1526254, and IIS-1563816.