Let’s Do Virtuous Data Visualization
The two hardest problems in Computer Science are: (i) people, and (ii) convincing computer scientists that the hardest problem in Computer Science is people.
-Jeff Bigham’s “Law of Computational Difficulty”
Why Ethics Is Our Problem
Visualization sits in an uncomfortable spot in the wider space of data science and communication. We’re stuck between the folks who collect and structure and store data, and the folks who analyze and use data. As such, there’s a temptation to think that we don’t have to worry as much about the ethical issues in data usage. Of course the people collecting the data have to worry about the ethical considerations of doing so (including privacy, security, and all the thorniness thereof) and the people using the data (the euphemistic “domain experts”) have to make sure they are doing the right thing, but what about us? After all, we just make the bar charts, right? So long as we’re not doing obviously deceptive things like blatantly misrepresenting the data with something out of our bag of dirty tricks, it’s tempting to think that we’re in the clear.
But, of course, we’re not. The charts we design are often the first and only contact that people have with a dataset, and we have tremendous power over what messages people take away (and, so, how those data are used). Even simple visualization design choices can reliably mislead. The choices we make as researchers and designers therefore have an ethical component. When we choose to, say, make a system that helps machine learning experts tweak their models rather than make a system that helps communicate the outputs of those models on the populations they impact, we’re making implicit or explicit choices about what we value. We don’t get to ignore the ethical implications of our work just because our y-axes start at 0 and our pie charts add up to 100%.
I think most people designing and researching in visualization already know that their work has moral implications. However, it’s not always clear what we ought to do differently. Part of the reason for this is that ethics is a deep and complex topic about which perfectly reasonable people can (and do) disagree. We want to do our work, after all, and not wait around for folks to argue back and forth interminably, especially for work where we think the ethical impact is clear and obviously not evil. There’s also a squeaky wheel problem. If you’re known as a person who cares about ethics, and the organization you’re working for thinks it’s doing something ethically dodgy, they might just shut you out of the decision-making process entirely rather than deal with you. If you refuse to work on a project due to ethical concerns, maybe the organization will just keep looking until they find somebody who will. And it’s always tempting to think that you should stay on a project that you feel is in a morally grey area just out of fear that if you don’t, somebody with fewer scruples than you will be in charge (let’s ignore, for the moment, that collaborators in some of the worst atrocities in human history have used this exact reasoning).
So a problem here is that we know we need to think about these ethical issues but it’s not always clear what we should do about these issues once we think about them. I think that part of this reluctance and uncertainty comes with how a lot of people (and, informally, a lot of computer scientists and data scientists) view ethics: as either an afterthought — thing you do after you’ve already done the “real” work on the project, or as a rabbit hole— something that’s so hopelessly complex or convoluted that it’s best to leave it to the experts. I think both of these perspectives are dangerous. However, I don’t have time, space, expertise, or reader patience to go through an entire history of ethics and moral philosophy, so let’s do a totally irresponsible simplification.
An Incomplete and Unfair Overview of Ethics
There’s three big schools of ethics (really there are many more, and a whole field of meta-ethics about deciding between them, but ignore that for now). If you are already up on your schools of ethics and want to get to the fun part, skip ahead to the calls to action in the next section. Otherwise:
- Deontological ethics. Here, you make up a list of rules, and it’s unethical to do things that violate those rules. The outcome of the action may not matter so long as the rules weren’t violated. E.g., you might have lied to your friend when he asked you if his new haircut looked good because you didn’t want to hurt his feelings, but it’s still unethical if you have a rule about telling the truth. You can see why this would appeal to CS folks: you get to reduce all of ethical decision-making to a constraints problem. Isaac Asimov’s “Three Laws of Robotics” in I, Robot are an example of deontological ethics explored in a technological setting.
- Consequentialist ethics. Here you decide upon a measure of utility and then take actions that maximize that utility. That is, your action’s morality is judged by the consequences that it had on the world. The actual action or even the intent behind the action may not matter so long as the outcome was good. E.g., maybe stealing that new TV from your neighbor’s house isn’t so bad if you can make a case for why you would enjoy it more than they would, and this gap makes up for the harm caused by taking it. This is appealing to CS folks as well because it treats ethical decision-making as an optimization problem. Many ML systems in the real world use some form of consequentialist ethics in their design, with cost and benefit functions that have real ethical impacts.
- Virtue Ethics. This is listed last, so you can tell it’s going to be the one I’m rooting for. Here your objective is what Aristotle called eudaimonia, or flourishing. The key verb here is “cultivating” and the key noun here is “virtue.” You want to cultivate the various virtues that you care about and will make you a better, more fulfilled person. So you’d wave “hello” to your neighbor not because “it’s the rule to always wave to ones’ neighbor” or because “the cost of moving my wrist is less than the social benefit of being greeted” but because you want to cultivate your politeness or your empathy or even a virtue as fuzzy as your niceness. There are some issues with virtue ethics (for instance, it’s even more difficult than with the other two schools to decide what not to do in a dilemma; it’s hard to have a virtue ethics equivalent of an anathema), but I think it’s a more actionable way of thinking about ethics than the other two, and maps on well to how a lot of people think about moral actions at the day-to-day level. I think it also captures the view of ethical behavior as a project rather than a procedure.
I like the virtue ethics framework because it sidesteps a lot of the excuses that we use to avoid tackling ethical issues in our work. It doesn’t divide the world into “ethical” or “unethical” action (so we don’t have the “well, I’m a good person, so what I’m doing must be good” excuse). It allows multiple praise-worthy actions for any given dilemma (so we don’t have the “well, we just don’t know what the right thing to do is, and by the time we’re done arguing it’ll be too late” problem). That virtue ethics just so happens to dovetail nicely with a lot of feminist epistemologies about promoting individual experiences and de-emphasizing the notion of an objective “view from nowhere” I view as another point in its favor.
In the paper I talk about lines of visualization research where I think there are big, unanswered ethical issues left to address, and where there are values that I feel are in conflict. But let’s cut to the chase and talk about some of some of the virtues that are often overlooked or insufficiently cultivated in visualization work. I should point out that the concerns that arise from these virtues are not new; some of these values show up in Catherin D’Ignazio and Lauren Klein’s aforementioned upcoming book Data Feminism. Others show up in Giorgia Lupi’s manifesto “Data Humanism.” Still others show up in Marian Dörk et al.’s alt.chi paper “Critical InfoVis.” I’m just trying to amplify the discussions that have been happening for decades now, and remove the excuses we have for avoiding these topics.
Make the Invisible Visible
Maciej Cegłowski said that “Machine learning is like money laundering for bias:” it’s a way of taking all of our human biases and preconceptions and running them through an allegedly “objective” algorithm to give them the appearance of truth. For instance, an algorithm designed to bypass the subjectivity and racial bias in parole decisions may just reproduce those biases. ML is not the only part of the data science ecosystem with this problem. A visualization can only show so much of the data, but it can act as a powerful persuasive force. And the data we collect, and the reality we are seeking to communicate, are not always in sync. This means that large parts of the world can be made invisible when they are visualized.
There are implications in what parts of our data, and our data provenance, that we choose to hide. This can be reflected in situations like hiding the uncertainty in our data (or our predictions) and so making people behave with much more certainty than is probably warranted (or being caught flat-footed by “black swan events” where our models were wrong). It can also be reflected by not acknowledging the time and labor that went into the collection of the data (for instance including a big byline for the visualization designer, but not a link to people who made the backing data source). Lastly, it can be as simple as not acknowledging how a particular visualization toolkit or dataset could be used (or abused) in the real world.
It is virtuous to tackle these invisible parts of data visualization, and make them visible when we can.
Some things to do right now:
Write a post-mortem for a design that you made. What impact did it have (especially negative impacts that you didn’t envision when you started the project). Share this post-mortem if you can!
Make sure your visualizations have proper attribution on them, with clear links back to the designer and the data source.
Consider making your design and analytical process transparent as well, through a detailed lab notebook of what you did to the data to get it into its final shape, and what different visual designs you considered but discarded. The LitVis framework provides a way of thinking about these sorts of issues.
Collect With Empathy
Alberto Cairo called visualization the “unempathic art.” It’s just very difficult to go from looking at a bar chart of, say, the homelessness crises, to empathizing with real human beings. We are much better at empathy when we’re exposed to real people and hear real stories directly. Once these concerns are turned into numbers or charts they tend to lose their immediacy and so potentially their moral impact. These problems become exacerbated when our data sets become so large and complex that there’s just no way to draw a path back from a point in a graph to the suffering of an individual human being.
Nevertheless, the data we collect and visualize can also have a big impact on people. Companies, governments, and individuals may use visualizations to make data-driven decisions on who to hire or fire, what policies to adopt or discard, and what causes to support. We should be mindful that our data comes from (and is often about) real people.
It is virtuous to moderate the power that we have when visualizing data, and be mindful of the impact we have on other people. For instance, we might want to encourage the use of smaller data sets to reduce the impact on those whose data were collected and shared, and minimize the invasiveness of our work. We also might want to anthropomorphize our data, so that our audience can connect with the real people behind the numbers.
Some things to do right now:
Consider rhetorical devices to promote empathy in your audiences (no, just putting pictures of people in with your graphs may not cut it).
Include more human-centric data and experience in your current visualization project. For instance, qualitative data or even just pictures.
Come up with a plan for removing or anonymizing the data you have in case issues of privacy arise.
Challenge Structures of Power
It’s very easy for governments, companies, and big organizations to collect data, and to hire people to analyze and visualize their results. It’s comparatively harder for communities and individuals to collect those sorts of resources. This means that there’s almost always a power imbalance on whose data gets visualized, and for what ends.
Data-driven decision-making has infiltrated all aspects of our society, from deciding what colleges we get into, what loans we can apply for, or how law enforcement treats us. As visualization researchers and designers, who specialize in communicating data to wide audiences, we have a great opportunity to act as part of “data due process” and ensure that these often biased and flawed algorithms are not unimpeachable arbiters of what happens to people. We can also use our skills with data to advocate for causes we believe in. Lastly, we can use our positions of relative power to surface decisions we disagree with, and encourage the responsible use of data within our wider organizations.
It is virtuous to oppose injustice, and to challenge unfairness and bias when we can.
Some things to do right now:
Instead of the iris dataset or the the Titanic dataset, use your next demo dataset as an excuse to advocate for something.
Develop a contingency plan if you are asked to work on a data science project that you suspect has unethical ends. What organizational structures are in place that might allow you to change unethical decision-making? How might you create such structures, if they don’t already exist?
Consider how structural inequalities are impacting your current data project. Who will and won’t reap the benefits of your work? Who will or won’t get to decide how your data will be used?
So Now What?
Yet another reason that I like virtue ethics is that none of the virtues I described above are beyond criticism or suggest one unequivocal action. There’s wiggle room for nuance and complexity and the messy exceptions that come up when doing any sort of work in the real world (although I acknowledge that too much nuance can be a bad thing). Taking action to cultivate one might come at an unacceptable cost to another. All of the virtues that I proposed above have crucial caveats. Maybe visualizing invisible components increases the complexity and expert knowledge required to understand your visualization so much that it no longer helps the audience you intended to help. The way that we build empathy for other people often has nasty xenophobic and racist underpinnings; maybe we need a level of dispassion in how we present data in order to promote equal treatment. And, lastly, we’ve already seen that strong cultures of anti-intellectualism and poor faith arguments can undermine expertise; maybe certain power structures need more support, not less.
So this post isn’t meant to be a checklist (although, if you’re interested in adding an “ethics checklist” to your data project, check out the deontologically-oriented deon library, based in turn on Loukides et al.’s free book, “Ethics and Data Science”). Nor is this post meant to be an exhaustive (or even particularly authoritative) guide to all of the ethical considerations that might arise in visualization research. But I consider it a success if I can shoot down excuses for not thinking about the ethical implications of our work. Ethical considerations shouldn’t be an afterthought; we should be thinking about our impact over the entire lifetime of a visualization project. It’s a conversation we should be having before we write the first line of code, and after we’ve pushed out the last release.
Thanks to Marti Hearst and Jessica Hullman for their feedback on this post.