Mapping Sexual Assault Beta (this model contains security vulnerabilities)

Addressing Under-Reporting & Perpetrator Correlation Problems With Confidential, Double-Blind, & Anonymous Cryptographic Surveying Techniques

Published in

praxis journal

12 min readDec 9, 2013

Please Note: The security architecture outlined in this paper has several flaws, which were pointed out to me post-publication. We have since published an entirely new, decentralized model designed to address those concerns. If you are interested in reviewing the latest paper describing this more robust model, you can find it here:

https://medium.com/praxis-journal/6323511aef59

Problem: Since sexual assaults are under-reported, people who commit sexual assault often escape contextually appropriate consequences due to the difficulty of discovering information about past assaults, the knowledge of which would influence the judgment of whatever body is responsible for individual accountability within an organization such as a university.

Context: University administrations, military hierarchies, and other non-judicial governing organizations that exercise coercive influence over both the public and private life of their members oftentimes frame cases of sexual assault as “teachable moments” for offenders. It is common for these institutions to frame an offender’s behavior as a “bad decision” made by an otherwise good person, or as an opportunity for growth, but rarely as a punishable and violent offense.

In the past ten years, it has become increasingly common for universities to collect statistics on sexual assault. We have since learned that most universities have a sexual assault rate against women between twenty to twenty five percent, meaning that roughly one in five female students at a typical American university will have experienced sexual assault before graduation. While this statistic provides a portrait of the damage caused by sexual assault, it does not give us a good view into just how many people are responsible for these assaults. We don’t have a clear graph of what the density of connections between survivors and perpetrators would look like; are there many people, each of whom are responsible for one or two assaults, or are a small group of people responsible for a majority of the assaults, or is it a hybrid structure; is there a large group of people committing one or two bad acts each, alongside a few predatory actors responsible for a disproportionate number of assaults? We don’t have this information.

This question is important because, without answering it, it is difficult to tailor the posture of an organization’s accountability process; should an administration be aggressive in pushing for harsh penalties, or is fostering behavioral change a more situationally appropriate avenue? Without knowing a perpetrator’s history or the makeup of the perpetrator population, it is difficult to make this determination.

Regardless of prosecutorial posture, any accountability process should be initiated by the survivor and offer the defendant and the plaintiff a forum in which to submit evidence and witnesses in support of their case before a jury of their student peers. While an open, adversarial process can be triggering and create a barrier to justice for a survivor of sexual assault, any judicial apparatus that provides for consequences at all proportionate to the seriousness of rape should offer an adversarial forum in which dispositive facts may be presented and questioned by those with the strongest incentive to advocate for themselves, the survivor and the perpetrator. The discovery of dispositive facts should be undertaken by the plaintiff and the defendant, respectively, or by student advocates acting on their behalf. Mediation or facilitated negotiation should be made available if both parties consent to enter into such a mediation.

Returning to the question of how to discover the ratio of survivors to perpetrators in a given population: if the population in question has a high number of people who commit one or two assaults, then the appropriate community-wide accountability response is to foster behavioral change and to attempt to subvert the culture that makes rape widely acceptable. If, on the other hand, a small group of people is responsible for a disproportionate number of assaults, then the rational option is to identify those individuals and remove them from the population altogether, in order to minimize probable future harm. If the offender demographic is a hybrid of people repeatedly committing rape and individuals who may have only commited one assault, then any accountability process would be better informed by a background on the subject’s history of assault.

Universities and militaries do not generally try to discover any statistics on the density of connections between survivors and perpetrators. Therefore, when they frame sexual assault as a “teachable moment” for a dispositively confirmed rapist, they have no idea whether it is statistically probable whether that would be such an individual’s first offense, or what that individual’s actual history of assault might look like. Furthermore, when universities and militaries engage in community relations campaigns designed to address the issue of sexual assault, whether through raising awareness about the problem or by holding consent workshops, they actually have no idea how to target these messages appropriately. The underlying assumption of such campaigns is that the people who are actually committing sexual assault are making mistakes, are unconscious parts of a culture that promotes and condones a disposable understanding of womens’ bodies, that they are basically good people in a bad cultural situation, who would be open to change if given the chance. This is probably true of many cases, but without an understanding of how many survivors share the same perpetrator(s), it is impossible to know how many people fall into this category, and hence how many people would be open to education and change. Without knowing the number of determined perpetrators who might be responsible for multiple or even many assaults, it’s impossible to know whether this approach is wholly or only partly appropriate.

Abstract: By constructing a double blind system for reporting the names of perpetrators, in which the collector does not know the identities of the respondents, and in which the collector does not know the names of the reported perpetrators, but only unique identifiers with which the collector might correlate multiple survivors with a common perpetrator, it would be possible to produce statistics showing the number of perpetrators in relation to survivors and to produce anonymized charts which would graph individual perpetrators in the population against the number of people they had assaulted.

Threat Model: The collection system must protect the anonymity of the respondent and conceal the real identity of the reported perpetrator from the collector, in order to minimize reporting bias. If the anonymity of the respondent is not protected, they may choose not to report out of a fear that their report might be made public, or that an accountability process might be initiated without their consent, or that they might face reprisals or threats by the perpetrator or their associates, or out of the general fear that their report could be traced back to them in any way. If the real identity of the reported perpetrator is not concealed from the collector, then the respondent would have no way to confirm that their report could not be used as the basis for initiating an accountability process against the perpetrator without their consent, even if the anonymity of that respondent is protected against discovery by the collector. Any action taken against the perpetrator by the collector, even if the collector does not know the identity of the respondent, could expose the respondent to the perpetrator and their associates, since the perpetrator might be able to deduce the likely source of the report themselves.

The ideal reporting system will provide no incentives to lie and no incentives to tell the truth. Any survey is vulnerable to a chaotic respondent, one who provides false information without motive. The goal is to minimize incentives to either fabricate information or to engage in self-censorship. By protecting the anonymity of the respondent, respondents do not have to fear reprisal for their reports, and therefore are less likely to engage in self-censorship. By concealing the real identity of the reported perpetrator from the collector, the system does not give the anonymous respondents any incentive to provide false information to the collector, since the information has no punitive potential that might be abused by anonymous respondents making false reports. The number of false sexual assault reports made in the real world is probably extraordinarily low, due to the high social cost of reporting for the claimant and the limited likelihood of any real action being taken against the accused. However, because this reporting system would be anonymous, the social cost of reporting would be removed and the risk of false claims would be elevated. For this reason, it’s especially important that the names of reported perpetrators be withheld from the collector and anonymized within the system.

Implementation: The survey system would run as a Chromium extension in the respondent’s browser; reported perpetrator names would be encrypted locally using SHA2; the Chromium extension would be packaged together with a custom installation of Tor, which would be reconfigured for use with Chrome browser; SSL would be the back-end transport layer to protect against possible logging by exit nodes in the Tor network. To protect against unauthorized persons participating in the survey, each student participant will be emailed a unique disposable password, used to unlock the appropriate survey within the browser extension. After opening the survey, this password will be discarded by the browser extension. The user will then be prompted to enter a new password, which will be hashed using SHA2 and exported over Tor with the respondent’s answers to the survey questions (why this would be done is covered below in “From Analysis To Resistance”). After the survey is completed, the browser extension will not permit any user on that computer to take that particular survey again, to protect against valid participants taking the survey multiple times.

Since the integrity of the system’s security is reliant on maintaining the anonymity of both the respondent and the reported perpetrator, the survey cannot be conducted as a form on a website. Even if the website were hosted as a Tor hidden service, which would protect the anonymity of the respondent, there would still be no way to conceal the name of the reported perpetrator from the collector. Of course, one could just deploy a bullshit solution, where the system compares the reported names and then produces statistics for the collector, without displaying the actual names. This doesn’t minimize reporting bias, however, since the collector would still have access to the plaintext names of the reported perpetrators on their system; they would just be “promising” not to look at them. The respondents would have to either trust them, or not trust them and choose to not report.

Due to this reporting bias vulnerability, the name of the perpetrator must be encrypted locally, on the respondents machine. This would be done using SHA2, a set of cryptographic hash functions, which would turn the perpetrator’s plaintext name into a unique number, or hash. Since a hash function always turns identical plaintext into an identical hash number, these hash numbers could be compared with the hash numbers reported by other respondents, without the system knowing the real plaintext names of reported perpetrators. Any hash numbers that matched would indicate that the respondents who reported those hash numbers had reported the same alleged perpetrator.

The server set up to receive reports exported from respondents’ browsers would be configured as a Tor hidden service, in order to insure that the collector could not map respondents’ IP addresses to their real-world locations in their respective dormitories or off-campus housing. The browser extension installed by respondents would be configured with a custom modification of the Tor browser bundle (which is normally configured with the Tor Project’s own version of Firefox, which does not permit the installation of additional extensions, in the interest of maximum user security). All of the information entered into the survey questionnaire would be exported from the browser extension via SSL, an encrypted HTTP transport layer, in order to protect against the logging of confidential information by a malicious Tor exit node.

Analyzing the Data & Potential Reporting-Bias Concerns: The value offered by a system such as this lies in its unique deference to the factors that lead to the under-reporting of sexual assault, such as: fear of perpetrator reprisal, the frequent and shocking community hostility towards the survivors of sexual assault, and a judicial system that is often apathetic or interrogative of sexual assault survivors. At the same time, this system architecture protects individuals from defamatory attacks by malicious actors by anonymizing alleged perpetrator identities. By separating data collection from investigation, the system respects the concerns that ordinarily lead to under-reporting, protects individuals from false accusations, while anonymously gathering data that can be used to map the social structure of rape in a given population. By mapping this information, a researcher could discover the prevalence of serial perpetrators, as well as the number of alleged assaults each of them might be responsible for, in any given population. That community would then be able to tailor their strategy to combat sexual assault, whether that be a pedagogic or a more aggressive, investigatory approach. It would not be possible to discover this information by only surveying the number of self-reporting survivors. However, this system will only address the under-reporting problem if respondents have some understanding of how the system works, and thus how it both protects them and provides no incentive to falsify anonymous reports. Without this understanding, any potential improvements in reporting accuracy will be lost.

From Analysis To Resistance: The system described above holds out the tantalizing prospect of identifying serial rapists hiding within a population, but by its very design denies all actors access to the identity of such serial rapists. Given the architecture of the system described thus far, the only way to discover that information would be for a system administrator with access to all of the respondents encrypted data to select a hash number that was reported by multiple respondents and then to execute a dictionary attack on that hash by hashing common names, and then comparing that output to the hash number that had been reported by multiple respondents (one could make it expensive for such an attacker to execute a dictionary attack against the entire list by generating a salt concatenated with the perpetrator name at the endpoint level, within the browser extension, however, this would not defend against a dictionary attack against a single salted hash number). This, however, would be the sort of breach of trust that the encryption of the perpetrators’ names was designed to prevent in the first place; if respondents have a reason to fear that their responses could be publicized, they will be less likely to report. It would seem as if we are stuck with a sort of Heisenberg uncertainty principle of sexual assault; the less we know about a rapist’s identity, the more we know about the number of assaults they have committed; whereas the more we know about a rapist’s identity, the less we know about how many other assaults they may have committed.

There is a way out of this paradox, though, if we abandon the notion of a centralized administration of justice. In the first paragraph of the section “Implementation,” we described how the respondent would be prompted to choose a password after opening the survey in their browser extension, and how this password would then be encrypted using SHA2 and exported along with their survey responses over the Tor network. What is the purpose of this operation? This step does something special: there is now something that the respondent knows (their password) and which the collector has (a hash of their password). The collector cannot link this hash, or its corresponding plaintext password, to the respondent, because they did not assign it to the respondent, and because it was delivered to them anonymously over the Tor network. This means that we can do something very subversive with our data set: we can introduce survivors who share the same perpetrator to one another, without ourselves knowing the identity of either the perpetrator in question or the identities of the survivors we are introducing to one another.

We, as the system administrator, cannot deliver to a respondent a list of survivors who share the same perpetrator as they do, because all respondents are anonymous. However, we can do something else. If we establish a Tor hidden service webpage that prompts users to enter the password they had chosen before filling out the survey, then we can redirect each anonymous user into a private chat room corresponding to the hash number of the perpetrator they reported; that private chat room would be filled with whomever else reported having been assaulted by the same person, the person whose name corresponds to the hash number of the chat room. There, in the mathematically sequestered privacy of an anonymous cryptographic darknet, those survivors could decide exactly what they wanted to do with the person who had raped them. I’ve heard tell that a u-lock to the skull and drop point blade to the testes rarely fails to leave an impression. Sometimes justice is not so complicated, after all.

Mapping Sexual Assault Beta (this model contains security vulnerabilities)

Addressing Under-Reporting & Perpetrator Correlation Problems With Confidential, Double-Blind, & Anonymous Cryptographic Surveying Techniques

Written by praxis