Alice Tells a Story

Allison Bishop
Proof Reading
Published in
7 min readOct 25, 2023

It takes a certain amount of hubris to write a blog. Ideally, just a pinch. Enough to imagine that your words can matter or that “awareness” is still worth raising. But not so much that you ignore nuance and gloss over the challenges that would make your ideas less shiny.

I have to be honest, my hubris is hanging by a thread. I’ve sat down to write this post several times, and walked away from it each time to the convenient excuse of a dog barking or a slack message pinging. There is something concretely satisfying in the minutia of everyday life that I retreat to when my mind is troubled. The sound of kibbles hitting a metal bowl. The smell of coffee brewing. These are the small joys I feel confident I can bring into being.

But as I walk my dog around the piles of fallen leaves or sip my warm coffee in the pleasantly chilled air, I am reminded that it is October, and I feel I have to write something.

October is domestic violence awareness month, and there are signs about it posted near the entrance of the building on the City College campus where I teach. This year there was a t-shirt decorating event, so disembodied shirts are hanging from a clothesline, displaying supportive slogans about the bravery of survivors who tell their stories:

And this is where my hubris creeps back in. Couldn’t we make it just a little easier for survivors to tell their stories?

Don’t get me wrong, it will never be easy for a human being to relive some of the worst moments of their life. But telling stories can have tremendous value, both for the storyteller and for the audience. For the storyteller, crafting the story can be cathartic or might yield new insight. For the audience, stories can foster empathy and lead to a more nuanced and effective understanding. This is a foundation that must be in place for true allyship and progress to occur.

But there are some particular layers of difficulty to telling stories about domestic violence — namely the fear of retribution from those involved and the level of social stigma around the topic — that seem more punishing than necessary. There is a painful irony in this, as the suppressed stories would help alleviate the social stigma. Anonymization is a tool that is supposed to address these kind of problems. But on its own, it doesn’t feel like a very satisfying protection. Even without names, a story is likely to be recognized by those involved, leaving the storyteller vulnerable to retribution and prejudice.

Imagine the plight of a survivor sitting down to write an anonymous version of her story — let’s call her Alice. She might start by writing the raw truth:

And then he grabbed the phone out of my hand and threw it into the aquarium, yelling that if I was going to be texting other men, then I just shouldn’t have a phone.

Staring at this, Alice might fear that the aquarium detail is too recognizable, and end up crossing it out:

And then he grabbed the phone out of my hand and threw it i̵n̵t̵o̵ ̵t̵h̵e̵ ̵a̵q̵u̵a̵r̵i̵u̵m̵, yelling that if I was going to be texting other men, then I just shouldn’t have a phone.

But now is it clear that he actually destroyed the phone? Maybe she modifies it again so that the effective result is more clear:

And then he grabbed the phone out of my hand, threw it on the floor and stomped on it, yelling that if I was going to be texting other men, then I just shouldn’t have a phone.

Perhaps now she is satisfied with this particular sentence, reasoning that it conveys the important features of what really happened without identifying detail. But then she zooms out and considers the larger sequence of events in her story. Even if each individual scene feels generic enough to avoid identification, what about the combination? Perhaps someone would recognize it from the timeline or the ordering of events. How should she ultimately decide how “anonymized” is anonymized enough?

In many other domains, we have systematic tools for helping navigate these challenges. For quantitative and categorical data, there are now a couple of decades of research into ways to strengthen protections for individual privacy while allowing collective analyses. Such research started from a primitive baseline of “anonymization = removal of Personally Identifying Information (PII)”, where PII was generally understood to encompass only obviously identifying data fields, like names or social security numbers. But this was shown to be woefully inadequate, as combinations of less-obvious data fields — like zip codes, birthdates, and gender — are often sufficient to specify unique individuals (see here, for example).

In recent years, many stronger tools have been developed to address the threat of de-anonymization of supposedly “anonymized” data. An emerging gold standard is differential privacy (introduced in this paper), a concept which has heavily inspired our work at Proof in how we design our protections for client privacy and our framework for defining information leakage in trading. The core idea of differential privacy is that the results of calculations on supposedly “aggregated” data should not qualitatively change when a single person’s contributions are removed. This provides a compelling privacy guarantee for those contributing their data, as they are assured that anything that happens as a result would have occurred similarly without their individual participation. Intuitively, this is also deeply related to robustness for statistical calculations, as we don’t want to rely on results that are overly swayed by single points while drawing conclusions that are intended to be representative of our data holistically. (See this paper, for example, for discussion of a more formalized connection between the concepts.)

But there is a limit to how much we as humans can empathize with aggregate statistics. For domestic violence, the stats are grim. Approximately one in four American women and one in nine American men will experience domestic violence in their lifetimes. I looked that up so I could give you a link for it, but I didn’t have to look it up to remember it. The “1 in 4” part is basically tattoed on my shoulder — one of the four roses in my bourbon-themed tattoo is painted black for this reason. But even etched permanently into skin, a number can never really tell a story. It can never really “represent” Alice.

People who know me well would tell you: there is nothing I hate more than an over-hyped technical solution to a non-technical problem. And yet … if I allow myself — for just a moment — to borrow some of the unbridled techno-optimism that gets thrown around so indiscriminately, I might dream of a tool that helps Alice tell her story. A tool that provides her with more safety than guesswork, and more substance than a statistic.

In collaboration with a friend, I recently wrote an under-baked sketch of what such a tool could look like and what kind of parts it would need. Here are some of the parts we imagined:

· A scientific definition of privacy for storytellers that provides more satisfying protection than ad-hoc redaction of names and other obviously identifying details, but still allows effective story-telling. This might be formulated in an individual or collective context.

· A design of data structures or templates that can hold the crucial “bones” of stories

· An interface for storytellers to fill out their stories

· A way of sampling stories for readers based on the inputs that satisfies the chosen privacy definition

All of these pieces are interconnected, and such an effort overall would require several distinct kinds of expertise. Computer scientists, psychologists, literature and language experts, journalists, and storytelling stakeholders would all need to be heavily involved in crafting a tool like this in order for it to faithfully serve its intended purpose. I don’t honestly know how to put such an effort together. The traditional academic way would be to recruit senior researchers across these fields and write a joint grant. Then grad students would be allocated, papers would be written, and perhaps 10 years from now, a tool would exist that could really help Alice! That tool would be presented at a conference to much fanfare (or not), and then would languish unmarketed and neglected somewhere as a link on a university website.

The traditional industry way to do this would be to bypass most of the experts and stakeholders, build a prototype version that is basically ChatGPT hooked up to a google form, and advertise it as “private” and “secure” without ever bothering to define either of those things. It would get broken once it gained enough traction to be worth breaking, and all of the supposed secrets would spill out of it indiscriminately, needlessly saved in plaintext because “I dunno, maybe we’ll want that data for targeted ads someday.”

But I don’t want to race ahead now to either of these endings. That’s not what Alice wants either. She doesn’t want to think about the inevitable backlash that will come in the online comments or the pit of fear that rises in her stomach whenever her doorbell buzzes. For just a moment, she wants to sit with the words on the page and see her truth staring back at her. Existing. Acknowledged.

October is almost over now, and I write this as I sit at a table on campus. In a moment I will get up and go give a lecture about using adversarial examples to stress test software. But for now, I want to sit with the truth of the t-shirts hanging above:

I am not just a statistic, Alice tells me. I want to dream that she doesn’t have to be.

--

--