Counting, Collaborating, and Coexisting: Visualization and the Digital Humanities

Michael Correll
Sep 24, 2019 · 13 min read
Patron Saint of counting, the muppet Count von Count
Patron Saint of counting, the muppet Count von Count

Do not knock — Technology is making gestures precise and brutal, and with them men. It expels from movements all hesitation, deliberation, civility.

— Theodor Adorno, “Do Not Knock,Minima Moralia.

Context/ tl;dr : I was asked to give a keynote for the upcoming Workshop on Visualization for the Digital Humanities. I take keynote duty as an opportunity to make as provocative a point as possible. The particular Hot Take I’ve decided to go with is that visualization is a bad neighbor to the digital humanities: it exacerbates the worst tendencies of DH scholarship and promotes parasitic, technocratic collaborations. I think we need to tear down and then reconstruct the power-balances, epistemologies, and scope of visualization’s connection with digital humanities.

The Curse of Counting

Joseph Mengele, Auschwitz’s “Angel of Death,” spent an initial part of the post-war period hiding out in a farm under an assumed name to avoid the victorious Allies and any chance of accounting for his crimes. During the war, he was frequently responsible for “selection duty” in Auschwitz, dividing those who were to be gassed immediately from those who would be spared, however briefly, for work details or his own “experiments.” After the war, on the run, stripped of his rank and power, he could not entirely suppress this instinct to count, select, and classify:

One had to take a scientific approach to sort out the edible, fodder, and seed potatoes. The frequency of the various sizes follows the binomial distribution according to the Gauss diagram. The medium sizes therefore are the most plentiful, and the very small ones and the very big ones are much less frequent. But since they wanted more medium-sized potatoes I moved the border of the selection for the potatoes for consumption accordingly and in this way I obtained more potatoes for consumption than usual. In this way my mind was kept active.

— Joseph Mengele, from his personal diaries

This absurd urge to count reminds me, a little, of the common folklore about thwarting the undead that shows up everywhere from the Romanian vampire to the Caribbean soucouyant: throw a little rice or sand on the ground in front of you, and the monsters will be obliged to stop and count every grain. If it were just counting that was the problem, that would be one thing. But as Catherine D’Ignazio and Lauren Klein point out in Data Feminism, who does the counting, and who are and are not counted, shape the political and ethical ramifications of the work we do.

Governments increasing rely on alliances with tech to perform the (often dehumanizing) work of counting, sorting, collating, and modeling people. Data science, data visualization included, just is inherently about counting and sorting and modeling. And this counting and modeling and sorting is often done to people who would rather not be counted or modeled, thanks, or who are materially harmed by being sorted into the categories set up for them. Os Keyes defines data science in these terms in “Counting the Countless:” data science is “The inhumane reduction of humanity down to what can be counted.” Jacqueline Wernimont in Numbered Lives: Life and Death in Quantum Media similarly draws a conceptual line from early census and death records data down through to today’s fitness trackers: all are “racializing, gendering, and colonizing technologies” with the side effect (or, more often, the goal) of making “certain lives possible or impossible.” The bar chart is just the public relations office of an entire lurching bureaucratic technocracy.

And that’s all assuming that our counting-based solutions actually work. Frequently, they do not. Or they work for the simple cases but fail dramatically in the very cases where we need the most assistance or intervention. Cathy O’Neil’s Weapons of Math Destruction covers many instances where algorithmic and/or “data-driven” approach results in misery, inefficiency, corruption, exclusion, and downright disfunction. If counting is the hammer we have, there really aren’t that many nails we can use it on. Instead of responding to this fact by getting more metaphorical tools, we instead try to pound the real world into some sort of countable shape anyway. Anna Lauren Hoffmann goes as far as to call this uncritical forcing of people into the data-shaped holes that have been made for them a form of “data violence” with intersecting repercussions in race, gender, and class.

Even if we “just” quantize books or art rather than bodies and governments, the digital humanities is not immune from these critiques, and all too frequently interfaces with with the existing technocracy. Some text analytics work reminds me not a little of Mengele and his potatoes: ways to keep the brain sharp and the mind active before returning to the “real” job of propping up the apparatuses of power, no matter how inhumane. How many text visualization papers that used literary examples only did so because they couldn’t publish with their intended corpora of intelligence cables or human intelligence documents? How many techniques designed for finding patterns in novels or poems were then picked up for use by repressive governments or their corporate allies to monitor and eventually imprison or kill their own citizens? Who is getting the most value out of digital text analytics work: MLA or Palantir? Are we getting distracted while our love of solving interesting problems is harnessed for nefarious ends?

Digital Humanists are largely aware of these criticisms, of course: Ted Underwood sees this line of thinking as practically cliché. Nan Z. Da’s much-debated critique of Computational Literary Studies (CLS) relies on critiques of counting for much of its structure:

CLS papers are more or less all organized the same way, detecting patterns based in word count…In CLS data work there are decisions made about which words or punctuations to count and decisions made about how to represent those counts. That is all.

–Nan Z. Da, “The Computational Case against Computational Literary Studies

In her response to Da, Katherine Bode strongly disagrees that DH is inherently about counting, but does admit that the spotlight on counting and statistical analysis is both odd and limiting for DH in literary context:

More broadly, both the focus on statistical modelling of word frequencies in found datasets, and the prominence accorded to such research in our discipline, puts literary studies out of step with digital research in other humanities fields.

Why, then, is counting such a common (or commonly publicized) part of DH? Why do textual analysis methods meant to contribute to our human understanding of how words create meaning so often end up used by tech giants or governments for nefarious ends? This is probably myopic, but I suspect it’s partially our fault over here in visualization: the way that visualization (and visualization research) interfaces with our collaborators in DH encourages the worst parts of DH and works to squander the promises and potentials of the field.

Visualization as it is ordinarily conceived of can’t address the problems with counting. If anything, it makes the problems worse, by solidifying the structures that we built when we did our counting in the first place. We can argue with a taxonomy, or with the methods section of a paper, but a visualization sweeps away these concerns and hides it behind a picture of “the data” as a reflection of reality. Visualizations reflect the schema we use to generate our data, and those schema can and do inherit the flaws, prejudices, and injustices of the people and systems that generate them. I’m reminded here of the “casta paintings” of Spainish colonial rule, where the various descendants of white colonists, American Indians, and black slaves were classified in painstaking detail according to complex racial categories. Casta paintings, the visual depictions of these categories, had prescriptive and rhetorical force, depicting not just the lineage of the various castes but the assumed color of their skin, the clothes they were likely to wear, and the professions they were assumed to undertake. Arbitrary visual representations of arbitrary categories that were used to structure and subjugate.

In other words, I think that an obsession with counting festers in digital scholarship, cheered on by the academic project of visualization which sees an end goal in “clearly” depicting data. It dehumanizes its sources, encourages the misuse or misapplications of its methods, and discourages debate and pluralism from precisely those problems where more, not fewer, voices are needed. This obsession is not native to the humanities, nor is it native to even the subset of the digital humanities: I think it springs from data science, visualization, and the positivist projects that those fields bring along for the ride. And it’s not the only way that visualization research harms DH.

Visualization is a Bad Neighbor

While it may seem like an extreme statement, I think the ideology of almost all current information visualization is anathema to humanistic thought, antipathetic to its aims and values. The persuasive and seductive rhetorical force of visualization performs such a powerful reification of information that graphics such as Google Maps are taken to be simply a presentation of “what is,” as if all critical thought had been precipitously and completely jettisoned.

— Johanna Drucker, “Humanistic Theory and Digital Scholarship

An XKCD comic by Randall Monroe in which a wide-eyed person “with algorithms!” tries to solve a thorny problem, and fails.
An XKCD comic by Randall Monroe in which a wide-eyed person “with algorithms!” tries to solve a thorny problem, and fails.
Here to Help,” Randall Monroe

When we finally coalesce all of the outstanding concerns in visualization into a single list of our own Millennium Problems or Grand Challenges — fiendishly difficult unsolved issues where even a modicum of progress would fundamentally change the field — it will include things like accessibility, handling messy heterogeneous data (especially text, images, and audio), dealing with uncertainty, visualizing provenance, and adjusting to different epistemologies and skillsets. These are problems whose solution would impact data visualization everywhere, from business intelligence to newspaper reporting. People in the humanities have these problems right now and have, in fact, been thinking about these issues for decades if not centuries. If you’re a visualization researcher, and you want to go where the interesting and forward-thinking problems are, working with humanists seems like the way to go.

At the first Visualization for the Digital Humanities workshop back in 2016 I said as much: I called the humanities great “problem factories” for moving visualization research forward. It was certainly true for me that whenever I would wrap up a project with a collaborator in the humanities, I’d come away having learned new things and seeing new perspectives that would shape my research directions for years to come; here’s one such post mortem that has some of that wide-eyed bushy-tailed “look how interesting their problems are!” energy. I was a little surprised, then, when a reasonable chunk of the humanists in the room bristled at this characterization. I still stand by my optimism, but I understand now why I got some pushback: it’s treating the humanities as some sort of specimen, a butterfly to be placed in the technocracy’s killing jar and then pinned to a board with all the rest. Or the humanities as some sort of resource, a place to go and extract value and then move on. It certainly doesn’t sound like two groups of equal partners working together to synthesize their knowledge in a collaborative way. It’s well-intentioned but perhaps poorly executed collaborative spirit.

A paper at this year’s CHI conference pointed to a similar unease. Bennett and Rosner’s The Promise of Empathy: Design, Disability, and Knowing the “Other” talks about the common distinction of designing for rather than designing with, but also talks about how the ways we try to build empathy and rapport for the people we’re allegedly helping is often more about centering our own experiences (and so tacitly erasing the experiences of the people with whom we’re supposed to be developing rapport).

In one of Bennett and Rosner’s motivating examples, designers trying to improve the accessibility of a voting booth tried an “empathy exercise” where they put on blindfolds to see what it felt like to be a blind user of a voting machine. The paper points out several problems with this approach. The first is that the experience of being blind is only superficially similar to putting on a blindfold (and, I’d argue, insultingly superficial); it’s like assuming that the answer to Nagel’s famous question “What is it Like to Be a Bat?” is to put on a pair of fake wings and hop around a bit. The second is that, by the very nature of the exercise, it divides the population into two groups: the designers (who are of course sighted and of course have to build up empathy with their “users”) and the users (who of course have no experiences in design and of course are there mostly to passively provide feedback on designs rather than exercise their own power). These sorts of exercises in designing for people all too often fail to center the people with the actual problems, and too frequently place the power in the hands of the people with tech savvy and design buzzwords.

For the past few years I’ve talked about the “Batman metaphor” of visualization research: that the ideal visualization researcher relaxes in their office, waits for some hard-up “domain expert” to turn on the visualization Bat-signal and then they swoop in, solve the problem, and return to the secret underground VisCave, job well done. Hopefully by now the problems with this metaphor are clear: how come it’s only the visualization person who is supposed to have these portable skills that are instantly valuable across domains? Without a strong commitment to learning with and from our collaborators, do we as visualization researchers have more than just a superficial understanding of the people we are “helping?”

There are pragmatic concerns with this model as well. We all too often disappear rather than stick around for the less “exciting” bits of collaboration: maintaining systems, teaching people how to use them, and creating iterative improvements. Who is left to deal with the maintenance and fallout once we’ve gotten our system designed and our visualization papers written? Since our research incentives are often focused on exploring the novel rather than the tried and true, are we pushing unneeded and unnecessary systems on people who could be more readily helped with existing systems?

The Batman metaphor is just one more facet of the trend of “tech solutionism.” Evgeny Morozov in his book To Save Everything, Click Here points to the often well-intentioned, but in practice disastrous, impulse from the technocracy that thorny, difficult, and perhaps intractable problems can be solved just by throwing enough technocrats at the problem: that we can code or data-collect or design-think our way out of inequality, climate change, or any of a number of pressing crises.

Like any other irrational belief, the belief that we can use data to compute our way out of any problem is almost impenetrable to concrete evidence to the contrary. Maciej Cegłowski talks about the fable of “investor storytime:” the promise that, well, your targeted ads and ad revenues aren’t very good now, but if you had just a little more data, or just a little more complexity, everything would work out just fine. It’s this impulse that drives the collection of more and more (and more and more personal) data despite the fragile and largely hallucinatory evidence of its efficacy.

All of these issues are why I think that visualization research is often a bad collaborator for the humanities. We begin from imbalanced positions and then try to stuff all of DH’s problems into rigid boxes. And since the real DH problems are very difficult and on the frontiers of our knowledge, we limit and reduce them down to ones that are easier to solve. For visualization and data science, the countable is easier to deal with than the uncountable. The known dataset is more tractable than the unknown dataset. The quantitative is easier to deal with than the qualitative. The certain is easier to visualize than the uncertain. From such a position, is it any wonder that the resulting collaborative scholarship would be over-reliant on counting, would be over-focused on the same easily-accessible corpora (say, English language texts written by the almost parodic collection of the same Dead White Male Authors), and would inflate the certainty of its conclusions?

So What Should We Do?

What the coming of the computer did, “just in time,” was to make it unnecessary to create social inventions, to change the system in any way. So in that sense, the computer has acted as fundamentally a conservative force, a force which kept power or even solidified power where is already existed.

— Joseph Weizenbaum, “Weizenbaum examines computers and society

I originally intended to close out this piece with some action items, things like “cool it with the counting” and “build equitable collaborations.” But really there’s a single underlying principle that underlies whatever principles I tried to put together:

We need to humanize visualization before we visualize the humanities.

I’m not the first person to say this by any means. Giorgia Lupi has an entire data humanism manifesto. Marian Dörk and others call for a “critical infovis.” Shannon Mattern challenges us to build “aspirational ontologies” to structure our information. But I think the inherent nature of visualization as an interdisciplinary act (after all, we usually aren’t the people with the data; we have to go to other people to get them) makes these calls for action more pressing. We’re not just getting our own house in order: we’re trying to stop the spread of the damage. Our work contributes to the way that other fields conceptualize, structure, and interpret their data. We bear a measure of responsibility for the inhuman ways that these data are collected and used. To make sure that we are proper stewards of this responsibility, we need to introspect, reflect, and listen.

This humanization is all the more necessary as a partner due to how powerful visualization is. Visualization just works: it helps people think in new and qualitatively different ways. And a close collaboration with the humanities will make visualization even better. I want to solve the fiendishly difficult problems that arise when we work together! So I (perhaps selfishly) don’t want us to cut ties entirely. But I want to recognize that the power of visualization is not intrinsically good (just as no transformative technology is intrinsically good). We must consider both the benefits and costs of doing DH work that is visualization-focused: the goal of making something “chart-able” can have repercussions all throughout the research process, from constraining how the data are conceptualized and collected all the way to influencing how (and how far) a particular work circulates.

Thanks to Heather Froehlich for comments, additional sources, and feedback on this post.

Michael Correll

Written by

Information Visualization, Data Ethics, Graphical Perception.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade