Interactive Visualization as Mediator Between Human and Machine Intelligence

Hendrik and Sebastian
Jan 29 · 9 min read

by Hendrik Strobelt and Sebastian Gehrmann

Machine learning models are only as unbiased and fair as the data they were trained on. This limitation shifted into the focus of research when models became increasingly applied to real-world tasks and thus became more and more involved in our daily life. In addition, we face the following dilemma: an improved model performance seems to be based on more complex models with little transparency within the decision process. The current fascination for deep learning methods is an excellent example of this dilemma. Deep learning describes a class of functions that combine many high-dimensional, nonlinear functions that are trained to perform a given task and can often achieve better performance than traditional models. The disadvantage of these models is that they operate as black-boxes which only provide little insights into why a prediction was made.

Since these models also learn hidden biases in the training data, following their predictions can lead to harm being done to people affected by these models. For example, in a study, a model predicted that patients with asthma had a smaller risk of dying in an intensive care unit. Upon further investigation, it was found that, at this hospital, patients with a history of asthma were always directly admitted to the ICU, even with small problems, resulting in more aggressive care and lower mortality. Thus, the model learned to associate asthma with lower severity of an issue, which leads to a better predictive performance (at least according to the test data). However, if the doctors had blindly trusted the output of the ML model, asthma patients could have suffered severely as a result of being classified for being less at risk.

In the light of problems in bias and interpretability, many questions arise: How can we find model biases? How can we explain what they are doing? How can we correct and detect their mistakes? And how can we learn to trust these models as future companions? All these questions provide opportunities for research in interactive visualization. Visual and interactive approaches can help humans to retain agency over the prediction process while still benefiting from the improvements in accuracy from machine learning models. Throughout this blog post, we will describe how visualization-based human-in-the-loop approaches could help make machine learning more transparent, more trusted, and finally more useful.

The human role(s)

Personas in visualization for machine learning and their view on the model. Parameters could either be model parameters or hyper-parameters. Architects have a white box view on the model architecture, trainers a gray box view, and end-users a black box view.

A common approach in visualization and human computer interaction research is to think about virtual persons that represent prototypical users in the research context — we call them personas. The three personas we identified for visualization in machine learning are:

  • The architect is the creator of machine learning models. She knows all the algorithmic details and pushes the field forward by developing new methods and architectures.
  • The trainer is a domain scientist who applies machine learning to solve a particular task and has some insight into model architectures. She could be, e.g., a bioinformatician trying to classify cell images or an engineer training an English-Klingon translation system.
  • The end-user is the professional or non-professional general user of a machine learning model who utilizes the model as an assistant. He could be, e.g., a user of a social network with face recognition capabilities or an MD using a particular risk factor analysis.

While all three personas have in common that they are involved with machine learning models, they have different interests, needs, and goals. Trainers and architects are interested in interpreting the internals (i.e., hidden states) of models, the end-user has a stronger interest in human-understandable explanations for specific decisions. Besides, trainers should care about de-biasing their models such that end-users are not affected or discriminated against.

How could we interact with models?

We think that interactive visualization can act as the mediator between human reasoning and artificial intelligence on three levels of integration: the human as a passive observer, the human as an interactive observer, and the human as an interactive collaborator. Let’s explain what we mean by that and why we think that the idea of interactive collaboration is the most intriguing.

Types of human and machine intelligence interaction.

During the early stages of interpretable machine learning research, many visualizations have been developed to observe performance or reveal insights about model internals. They often allow passive observation comparable to a photo snapshot taken at a current state. One example are performance curves during model training as it is provided by systems like Tensorboard or Visdom. While the trainer can observe anomalies — she can often only intervene by stopping the training process. But she cannot interact with the visualization as a proxy to communicate with the model. Other examples in this category are feature maps or dream images. Although these methods are not interactive, they fulfill the critical task of providing snapshots of model measures and model internals — and they are widely used to get a first glimpse.

In recent years we have seen multiple approaches emerging from the visualization community that allow interactive observation. By changing inputs of a model, these tools enable the observation of (internal and external) model reactions or analysis of learned model patterns. Examples here are approaches that allow user interactions like filtering of facetting of input/output data for investigation, arranging data by projection methods, navigating the model execution graph, or selecting meaningful patterns to probe a model. All these methods consider the model as static, and the human is an active observer of a stable and little more transparent block. In the figure below we show two screenshots of interactive systems we developed that allow to investigate patterns in recurrent neural networks and help to debug sequence-to-sequence models.

Screenshot of LSTMVis as example for interactive observation for architects or trainers. The tool allows investigation of what could be captured in the hidden states of an recurrent neural network. Details:
Seq2Seq-Vis as example of a hybrid between interactive observation and interactive collaboration for trainers and architects. The system helps debugging complex sequence-to-sequence models by investigating internals of the model and what-if testing of alternative decisions. Details:

We advocate here for approaches that consider the human as an interactive collaborator. In that understanding, the model is not regarded as static and can take input from a user to allow bilateral interaction. This enables a significant action for human understanding — playing. By playing with models and model interventions, humans can start building better knowledge and better intuition for model behavior and model boundaries. Even one step further, this collaborative integration can lead to shared human-model synergy and creativity. This means that humans can correct or modify internal decisions of deep learning models in a human-interpretable way to achieve better results. During training, a user can provide feedback using active learning approaches. After training, we have recently created an interactive demo in which humans and machine learning collaborate to generate an image by modifying activations of neurons in a generative adversarial model.

GANpaint is an example of interactive collaboration aimed at the end-user persona. The human can modify activations of neurons in a GAN to steer the production of a visual feature (trees, doors,..). Details & demo:

We hope that while reading this you already started thinking about crazy, playful, useful ideas for interactive human-model collaboration. But before you get started with implementation — check out the next section! It might actually provide some useful hints about dos and don’ts.

Lessons Learned — Recommendations, Pitfalls, Challenges

In this section, we will present a few (unordered) Dos and Don’ts we experienced when building interactive systems for ML. They should not be taken as dogmas or mantras — just as practical hints for getting started.

Don’t be the expert (unless you are the expert)
In visualization research, we often take joy in learning about new domains and their specific languages. We do not try to acquire the role of domain expert if the field is very distant from our own. Machine learning and visualization are closer than other pairings which might make us believe we could replace the domain expert role. Do not — ever.

Humans are your audience
One of the beautiful and creative challenges is how to represent machine learning model decisions and internals such that they can be interpreted by humans. While a continuous vector of many dimensions is readable — it is not interpretable. Thinking about good proxies for meaningful interpretation and navigation in these high-dimensional spaces is one of the first steps to work on model interaction.

Ad-hoc not post-hoc
In non-text-book settings, work in visualization and interpretation often shares the faith with UI design — it is considered as an add-on after the ‘major part’ of a problem is solved. As visualization or interpretability expert, try to be progressive and become the ad-hoc partner during the early method creation stages. Creating small tools for your domain collaborators helps to increase the chances to develop interactive, collaborative systems in later stages.

Iteration is your joy, not your burden
Throwing away your own work can hurt, but it is the core of creating visualizations. A tool that is useful to the user can take many iterative improvements over your first prototype. To minimize the pain, we recommend enjoying creating visualizations that we consider as initiators for discussions. And if you are lucky, you will find a collaborator who tells you how useful or useless they appear to her. We sing the same tune with courses taught in visualization and design when we emphasize that rapid prototyping is your friend.

Visualizations for hypothesis testing: Reject early, but do not confirm
Visualization tools can be applied to test hypotheses by showing examples or prototypes of typical predictions of relevant data. This allows users to quickly reject a hypothesis if the tool does not present the desired pattern. However, to evaluate global assumptions about the data or the model, the tool would need to demonstrate this pattern across the entire search space or data. Even when showing patterns across a whole data set, this does not inform a tester about what happens when the test distribution is different from the training distribution, a common problem in real-world applications. Thus, confirming hypotheses about a model is a challenging and unsolved problem.

Future Challenge: How to provide feedback and fix models
It is currently not well understood what to do with the input provided through visual interfaces. In the future, we envision smart feedback mechanisms that can constrain the posterior inference of a model by defining rules about what a model is supposed to do. Visualization can help here by providing insights on how changes affect the model on a global and local level.

Future Challenge: How to solve Fairness/Bias issues and not only detect them
In the same realm as the feedback argument, there are constraints on what can be done about the ubiquitous bias and fairness issues underlying machine learning models. Since model fixing is not a solved problem, visualization tools can merely detect issues with a model, including fairness and bias issues. This problem is challenging, and current approaches only help to identify biases. We think that in the future, these problems can be alleviated through the co-design of machine learning methods and visualization approaches.

Why text is hard, but necessary
When thinking about machine learning visualization, the most common tools address computer vision. While the nature of images allows for neat visual overlays, text-based models have the same issues where outputs can lead to potentially harmful results. One example is that of Facebook mistakenly translating an Arabic “good morning” into “attack them,” which led to a wrongful arrest. However, since text-based models make a series of predictions that might take into account different parts of an input, the saliency of inputs is not as easily visualized as with a single image. Therefore, it is of particular interest in developing interactive visualizations for text-based models.

Open, open, open…
Open source projects are the de facto standard for machine learning approaches to show reproducibility and therefore credibility. Visualization for ML should not be an exception to this. The overhead to transform prototype code into useable prototype code (not production code) is often marginal. Let users test if they can use your system out of the box in the way it is documented — rewards come as stars and forks. Consider also to publish your work in open access journals or write blog posts about them.

Call for action: open your models!

Talking about open access — this call goes to the trainers and architects. When creating new models, and you don’t have a visualization expert at hand, consider possible interactive interventions and create your models as open as possible. Allow read/write access to hidden states, allow modifications to beam search, allow overwriting attention values, or allow changing weights for temporary what-if testing. It really pays off.


We hope, some small or great ideas will come up when reading this blog on how humans and models can create fruitful collaborations in the future. We can’t wait to see these approaches popping up! We think that visualization can take a significant role as mediator between both.


Hendrik Strobelt is research scientist at IBM Research and the MIT-IBM Watson AI Lab. He is the co-creator of GANpaint.

Sebastian Gehrmann is PhD candidate at Harvard SEAS with a focus on interpretability of NLP models.

Hendrik and Sebastian are co-creators of LSTMVis and Seq2Seq-Vis.

Multiple Views: Visualization Research Explained

A blog about visualization research, for anyone, by the people who do it. Edited by Jessica Hullman, Danielle Szafir, Robert Kosara, and Enrico Bertini

Hendrik and Sebastian

Written by

Multiple Views: Visualization Research Explained

A blog about visualization research, for anyone, by the people who do it. Edited by Jessica Hullman, Danielle Szafir, Robert Kosara, and Enrico Bertini