Using Data to Determine Why Survivors Delay Reporting Sexual Assault

Ritti Bhogal
NYU Data Science Review
8 min readOct 12, 2022
Image from Unsplash

Content Warning: The following article discusses sexual assault, sexual violence and rape in the context of research. May be triggering or sensitive for certain audiences. Reader discretion is advised.

Sexual assault is a traumatic event for those who experience it, yet oftentimes survivors will wait for days, weeks, months, or even years before opening up about their experience. Take former Australian Parliament House staffer, Brittany Higgins, who waited over 20 years before coming forth to the public about her assault. She accused a current federal cabinet minister of raping her in 1988 when they were colleagues. While Higgins’ accusation elicited a discussion surrounding historical rape and the nature of Australian politics, Prime Minister Scott Morrison determined that an accusation was not enough to ask the accused cabinet minister to step down and that a police investigation would be required for any further action (Lee 1).

Due to Higgins’ delay in reporting her sexual assault, her accusation unfortunately lost recognition. However, there’s an important question to be asked in regards to Higgins’ experience. Why do sexual assault survivors delay or even refrain from reporting? According to an article written by Bri Lee, PhD student at the University of Sydney, “It is common for victims to experience a ‘freeze response’ where they become immobilized when subjected to sexual violence.” This delay acts as a form of psychological healing from the attack. However, delays in reporting any crime make it more difficult to find evidence that is considered viable in court proceedings. Lee writes that while a delay should never prevent the report of sexual assault, “Physical evidence such as DNA or fingerprints are compelling to juries, and are rarely available when any significant amount of time has passed.” Delays in rape reporting may result in the survivor never truly receiving the justice they deserve. This leads to the question: are there other factors that influence delays in rape reporting? Perhaps these factors go beyond the psychological impact of the assault?

Interestingly enough, data analysis can disclose several helpful answers to the people who study rape reporting. Konstantin Klemmer is a recent graduate from the University of Warwick and New York University with a PhD in Computer Science. His research in delayed rape reports has allowed him to generate a predictive machine learning modelling framework that forecasts delay in rape reports (Klemmer, Neill, Jarvis 1). He completed this investigation through the Machine Learning for Good Laboratory alongside its director, Dr. Daniel Neill.

Klemmer’s first step was preparing the data. He took data from reports made to the NYPD and LAPD across the span of 5 years. When I interviewed Klemmer, he emphasized, “We want a large enough data set so that inference or any modeling that we do on it is meaningful. For all sorts of different models that we want to use, we just need a big enough sample size so that a learning model can actually learn.” For each rape reported, the time between the occurrence and report was determined, and all the incidents were categorized based on whether the delay was at least a day, week, or month long. These three thresholds were the binary indicators, or the dependent variable, for classifying the delay at the level of the event (Klemmer et al., 5). To further clarify, each binary indicator of a rape crime data point was denoted either a value of 0 or 1.

Denoting day, week, and month binary indicators by Klemmer

Below are spatial distribution graphs that highlight the reported rape crimes, with the top row graphs displaying rapes reported after 1 day, week, or month, aggregated to the LAPD district level, and the bottom three graphs showing each individual rape crime not reported after 1 day, week or month respectively.

Spatial Distribution Graph of Rape Reporting Delays in Los Angeles created by Klemmer

The predictor variables in Klemmer’s modeling were demographics such as the survivor’s race, age, gender, socioeconomic status, as well as location of the incident (Salkind, 1078). Used in statistical modeling, predictor variables are independent variables. While predictor variables can have a causal relationship with dependent variables, this study only aimed to determine whether there was an associational relationship between the predictor variables and dependent variables.

Since Klemmer aimed to generate spatial and non-spatial models to more closely examine the difference in rape report delay due to location, he wanted to see if there was indeed a difference between spatial and non-spatial models. He determined this using the Moran I’s Test, which takes a set of data and determines whether the given values are spatially clustered and how similar these clustered values are to each other (Klemmer et al., 5–7). Spatial clustering of the data can determine similarities between delays of sexual assaults in certain locations, such as the Bronx or Chinatown, and whether or not the overall delay in reporting within these sublocations or New York City as a whole is similar. As Klemmer said in the interview, “we’re trying to figure out if this observed spatial effect is just a representation of underlying social processes.” Perhaps location-specific metrics such as the economic backgrounds of the average person may have some correlation to notable rape reporting delays across regions, rather than purely the location itself.

Klemmer used supervised learning to train his prediction models. He defines the approach as a classification and regression task since he and his colleagues “seek to predict whether a rape crime was reported within a certain time period (one day, one week, one month)” as well as “the log-transformed reporting delay.” The modeling pipeline created by Klemmer and his colleagues takes into account predictor variables “never used before in the context of rape reporting delays” (Klemmer et. al, 7).

Data processing pipeline and modeling framework for the conducted experiments created by Klemmer

The results of the data processing completed by Klemmer on the dataset demonstrated a clear variance in rape reporting delays between states. Across New York City, rape survivors waited on an average of 4 days before reporting their crime, whereas those in Los Angeles experienced a delay of roughly 3 days on average. This is different to other offenses, such as theft and assault, which are reported almost immediately after the incident (Klemmer et al., 9). Furthermore, Klemmer found a correlation between the time in the year a rape occurred and an increased delay, noting “substantial spikes in median reporting delays on some federal holidays and widely celebrated festivals” (Klemmer et al., 10). In accordance with these trends, he emphasizes the correlation between delay time and location in his paper, stating, “we show that rape reporting delays exhibit substantial spatial correlation, both at the event and area levels” (Klemmer et al., 2).

But what could be causing the disparities of reporting within each location? Klemmer suggests that perhaps the notable difference in delay between cities could be “manifestations of underlying social processes with spatial dependencies (such as segregation or low/high-income areas)” (Klemmer et al., 10). Further research into how the different social elements within New York and Los Angeles can affect not just the clustering of sexual assaults in certain locations, but also the distinct delay times for each city could certainly reveal more information.

Klemmer even found variability in rape reporting delays based on factors such as age, gender, and ethnicity. Survivors under the age of 18 or non-female survivors were less likely to report immediately, as were those a part of the Latino or Asian population.

Image and caption taken from Klemmer: Predictor variable importance (permutation importance) of the 10 most important features for the different regression and classification tasks of the S3 model across both observed cities.

The research performed by Klemmer is currently available to the public so that community organizations can benefit from these findings. In fact, unlike any other study on rape reporting delays, “our approach is based solely on openly available data,” (Klemmer et al., 17) which makes it easier for the public to reap the benefits of his research without being concerned about the confidentiality of such data.

Klemmer’s predictive models can be used as an asset to community organizations that are designed to advocate for women who have experienced sexual violence. Daria Denti, a specialist in the economic aspects of cultural diversity and aggressive social behaviors, conducted an examination on social support services across England and Wales. She found that the services of human rights organizations, such as Violence Against Women and Girls (VAWG), were instrumental to higher rates of reporting sexual violence.

Organizations can use the predictions from Klemmer’s models and anticipate which locations (and therefore which communities) are more likely to have people that are struggling to report the sexual violence the have experienced.

Klemmer’s predictive models have highlighted some useful information surrounding rape reporting delays. While it will take much more research, as well as help of survivors to operationalize the models, they have the potential to assist community advocates and increase the likelihood of reporting sexual violence for their community members. Organizations that aid sexual assault survivors can better target communities where prolonged delays in rape reports occur, whether that be due to location, or a survivor’s age, ethnicity, gender, etc. through the insights revealed by Klemmer.

Klemmer highlights at the end of his paper that “Behind every observation lies a human tragedy and unquantifiable suffering.” As we explore data-driven approaches to combat injustices in the world, it’s important to be respectful of every survivor’s bravery and request for anonymity. It is because of them that advances in rape reporting research can take place. If you would like to read more about Klemmer’s research, you can visit the article in the Royal Society Open Science publication here.

Works Cited

Klemmer, K., Niell, D. B., Jarvis, S. A. “Understanding Spatial Patterns in Rape Reporting Delays.” Royal Society Open Science, vol. 8, no. 2, Feb. 2021. EBSCOhost, https://doi.org/10.1098/rsos.201795.

Bhogal, Sukhleen. “Interview with Konstantin Klemmer.” 24 Mar. 2022.

Mendes, K., Ringrose, J., Keller, J. “#MeToo and the Promise and Pitfalls of Challenging Rape Culture through Digital Feminist Activism.” European Journal of Women’s Studies, vol. 25, no. 2, May 2018, pp. 236–246, doi:10.1177/1350506818765318.

Denti, Daria, and Simona Lammarino. “Coming Out of the Woods. Do Local Support Services Influence the Propensity to Report Sexual Violence?” Journal of Economic Behavior and Organization, vol. 193, Jan. 2022, pp. 334–352., https://doi.org/https://doi.org/10.1016/j.jebo.2021.11.024.

Lee, Bri. “Delays in Reporting Alleged Rapes are Common — Even Years Later. this Isn’t a Barrier to Justice.” EveningReport.nz, Mar 02, 2021. ProQuest, http://proxy.library.nyu.edu/login?qurl=https%3A%2F%2Fwww.proquest.com%2Fnewspapers%2Fdelays-reporting-alleged-rapes-are-common-even%2Fdocview%2F2644663612%2Fse-2%3Faccountid%3D12768.

Salkind, Neil J. Encyclopedia of Research Design. 0 vols. Thousand Oaks, CA: SAGE Publications, Inc., 2010. SAGE Research Methods. Web. 24 Apr. 2022, doi:10.4135/9781412961288.

--

--

Ritti Bhogal
NYU Data Science Review

Computer Science at NYU Tandon | NYU Data Science Club | NYU RoboMaster team UltraViolet | water is wet