Are women evil? Hacking Google’s search results.

Eni Mustafaraj
Jul 19, 2017 · 4 min read

And how Google can use its AI to make search better and empower its users.

Google search results for the phrase “are women evil”. Original image appears in this Guardian’s article.

Are women evil? When the journalist Carole Cadwalladr was suggested this search phrase by Google’s autocomplete, the results were not something she had imagined. “Every woman has some degree of prostitute in her. Every women has a little evil in her.” was displayed highlighted in a box, Google’s direct answer.

Why did Google show this result in the position zero? Cadwalladr talked to a few experts who told her about SEO (search engine optimization) techniques, the power of network (sites linking to one another), and the information war being raged by the tech-savvy rightwing. She blamed it on Google. It worked! Google cleaned up the results the next day. However, Google didn’t explain how that happened in the first place, except for reminding everyone that it is not responsible for the content. Nor did Google explain how it changed its ranking algorithm, so that this particular article is now relegated in the 4th page of results.

I was curious to know how Google’s algorithm was fooled and so I investigated. Indeed, whoever wrote that article certainly knows about SEO best practices. Here are the details:

  1. The URL of the article contains the query in its title:
  2. The title of the post, bolded in the <h1> HTML tag is also “Are women evil”.
  3. The phrase “are women evil” is mentioned 8 times.
  4. The word women appears 189 times, and the word evil 88 times. Together with the word men, they are the top three words (once you remove the so called stop-words: ‘the’, ‘to’, ‘and’) in this 10,000 word article.
  5. Though written at the 8th grade level (according to the Flesch-Kincaid grade level score), the article also contains many high information words: biological, unconscious, co-morbid, perception, cultural shift, etc.

Taken altogether these signals make this article very relevant to the query. An algorithm that treats words as strings of characters — and is unaware that their meaning can cause offense — is doing the job it was designed to do: find the page that content-wise is the most relevant match to the query. And while Ms. Cadwalladr and many others wouldn’t think of asking for “are women evil”, there are apparently plenty of people who do it, who then link to or share articles like this one. That is how the Web works and how a search engine learns what people want to see. And some people know more of these secrets than others. If Google is doing us a disservice, it is by keeping us in the dark about how certain groups are taking advantage of its algorithms. As well as by making it difficult for us to know more about the source of information from where an answer or snippet is extracted.

How can Google do better?

What if Google’s snippets came with a “nutrition label” with facts about the web source and the particular page? Something like in this image that I have created:

A Google featured snippet that was augmented by the author with some facts about the webpage from which the context was extracted.

This information (I call them signals) is something that an AI algorithm can be trained to recognize. As humans, we can tell if a text is a personal opinion, especially if we see 87 times the pronoun “I”, as in this article. When we read that the author’s name is Razor Blade Kandy, we can easily recognize that it’s a pseudonym. Then there is the matter of what this blog is about. Its subtitle says “A new interpretation of masculinity”. A bit vague. Using a tool like Amazon’s Alexa to look up info about this site, one learns that it has a similar audience like, MGTOW = Men going their own way. There is a Wikipedia page for that. The acronym is also a category in this blog, thus, the topical affiliation is explicit, not something to be inferred.

If Google showed such “nutrition labels” for all sources of information in its search results (maybe with different ingredients based on the nature of the source), we would be able to quickly scan what sources are being shown and make our own choices about what to consume and with what kind of skepticism. Just like we do in a supermarket aisle, picking up products and looking at the labels for the amount of added sugar or cholesterol.

Google might say, “but our AI is not capable of producing labels like that.” If Google says so, we can ask: why not? Why can’t your engineers, some of the most intelligent people on this planet, work on a AI technology that really helps humans make good decisions, by providing constant reminders that the Web is not a well-meaning publishing company, but a universe of voices each with their own agenda. Transparency and contextual information is what we need, not spoon-fed answers. Because if AI is going to take over as the fear-mongers keep reminding us everyday, it will be because tech companies like Google are daily undermining our human agency.

Eni Mustafaraj

Written by

Data and Web Science | Wellesley College | Immigrant | Feminist