Why academics shouldn’t move fast and break things
If scientists with PhDs don’t understand what Google results are and what they can tell us, what hope do the rest of us have?
by Alison Gerber, Ov Cristian Norocel, and Francesca Bolla Tripodi
The other day, Twitter feeds worldwide were flooded with the same horrific statistic: “how to hit a woman so no one knows” had been typed into Google over 160 million times during the pandemic. It raised alarm among journalists, advocates, and academics focused on intimate-partner and gendered violence. Perhaps COVID-19 quarantines and shutdowns had helped to curb the virus, but at what cost? According to a blue-check MSNBC columnist and the academic paper she cited, Americans’ search habits told a dire story. Victims seemed to be seeking help more often than before, and violent partners were apparently seeking strategies to avoid getting caught. The column contextualized the issue well, and the author’s handle had “feminist” in it. Why not retweet and amplify, horrified?
Well, because it turns out the paper’s findings were catastrophically wrong. And when the subject matter is this important, we’ve got to get it right; otherwise we risk contributing directly to anti-science, anti-feminist backlash. It’s no surprise that few readers clicked all the way through and read the study — even academics quickly promoted the story, assuming that the usual layers of gatekeepers had done their due diligence. The research was published in a peer reviewed academic journal, not one of those Johnny-basement predatory publishers. The research had passed through a double-blind review process — the gold standard of scientific legitimation. The paper is still out there, with the journal proudly displaying its climbing “altmetric” score measuring the number of times it’s been picked up by news outlets, tweeted about, and linked on reddit [**update below]. If you’re lucky enough to be employed by an academic institution, you can probably even read the whole article — otherwise, you can pay just $45 for the privilege.
The MSNBC piece cherry-picked evidence out of step with the peer-reviewed article’s aims — its author’s main arguments urged academics to move quickly and to use tech tools like Google for “rapid response” research. In its conclusion it suggested that, for researchers, “the greatest error is not to move, faced with yearly and global epidemics of suicide and femicide there can be no argument for inaction.” But the empirical inquiry the author used to make that point had problems that should have been obvious to anyone with even a basic understanding of Google’s functionality.
The author developed a set of search queries with no apparent links to the vast body of theoretical and empirical research on intimate gendered violence — they report that they used queries like “how to control your woman” and “he will kill me”, which they appear to have invented from whole cloth. The validity of these search queries as reflections of the concepts at play — “intentional male violence” and “indicative male violence” — was never discussed. Then they inputted their search queries into the search bar (and attempts to recreate the findings show that they seem to have neglected to enclose those phrases in quotation marks). And then the paper got really wild.
Using date range commands, the author claimed to compare searches for five months in 2019 to the same five months in 2020 — pre-pandemic to mid-pandemic. They did so not by turning to Google Trends data — the method promoted by the scholars they cited — but by inputting their search phrases with date delimiters using Google search. They reported the number of hits Google displays at the top of the page as the number of searches made for that search string. These numbers were exceptionally high because, well, the search phrases were not enclosed in quotation marks. All this in an article arguing for the value of tech-enabled “rapid response” research. A culture of “move fast and break things” is common in Silicon Valley, but academics typically work a bit more slowly and carefully to avoid these kinds of errors.
Part of the problem here is rooted in a fundamental misunderstanding of how Google works. When we search for information, Google relies on algorithms to make sense of what we’re searching. We think of algorithms as magic, but what happens is very simple. Google transforms input (i.e. keywords, geolocation, and the click-through data of other users) into output (content that best matches what Google believes is most relevant — directions, videos, news, etc.). From an information science perspective, the major error in this article is that it confuses outputs for inputs. The author claims that searches for the query “laid off” doubled, but what they actually show is that the number of returns increased.
If they were interested in checking for increases in a particular search term over time, they could have just used readily available tools like Google Trends. It’s easy to verify the paper’s claims there; they don’t hold up. Sadly, the phrase “how to hit a woman” has ebbed and flowed through Google data for the past five years.
But when we tried to replicate results for the keywords that snagged everyone’s attention, we found that the query was so specific and unusual it did not have any data associated with it at all.
How did the researcher develop their queries in the first place? In their paper, they do not rely on the many scholars that could have helped them develop a valid query set. They make sweeping generalizations — assuming intimate gendered violence occurs primarily within heterosexual couples, with a male aggressor — while spinning surprisingly specific queries such as “How to hit a woman so no one knows” or “I am going to kill her when she gets home”. The article never explains if these were derived from actually-used search terms, or were tested for validity in any way. Every step in the research process, it seems, suffered from a fatal flaw.
After a day and a half of horrified fact-checking online, the researcher acknowledged her mistakes, and claims to be in touch with the journal that published her work. MSNBC excised discussion of the article in their column and appended a brief editor’s note explaining the correction. The column’s author deleted her own tweets that pointed directly to the false statistics — though as of today MSNBC still has one of theirs published. Snopes stepped in. But it’s too late — the phrases highlighted in the original tweets and centered in the original column are still circulating on social media and in a world of second-tier outlets that quickly repackaged MSNBC’s content and broadcast it as their own. Many of them claim to speak as or on behalf of marginalized people; today, that original framing from the MSNBC column — and the false findings centered there — are beginning to spread in non-Western outlets. The false findings and their uptake in the media might be heading towards a new life as an anti-science, anti-feminist meme. The manosphere points and says: “See? You can’t trust academics!” “You can’t trust journalists!”.
Plenty of scholars have focused their attentions on the impact of the pandemic in this space with rigorous, careful research — our own query on COVID-related studies in journals like Violence Against Women, Journal of Family Violence, Journal of Sexual Aggression, and Partner Abuse yielded 38 results, with more likely in the pipeline; there are folks out here doing this work, and doing it well. But with issues this important, you have to get things right the first time.
How did this article get through peer review? Like the author, the journal’s reviewers and editors seemed to have been glamoured by the shine, tech fetishism, and naive empiricism of even the most poorly executed digital methods — without the methodological humility to work together with colleagues from information science, or at least check in with someone familiar with the basic workings of tools like Google. If we want to catch scientific missteps like these, we must recognize that good science takes time. And this mishap shows how desperately we need more robust digital literacy education at all stages of life — because if PhDs don’t understand the basics of what Google returns are and what they are telling us, what hope do the rest of us have?
**Update: The article was retracted on May 19, 2021, a few weeks after this article came out. No information has yet been forthcoming about the peer review or editorial process, to our knowledge.