Sometimes, Google can be bad for learning history. Very bad. This is how Peter Shulman, an associate history professor at Case Western Reserve University in Ohio, started a Twitter thread on Feb 22, 2017. Professor Shulman was lecturing on the reemergence of the Ku Klux Klan in the 1920s when a student asked the question: Was President Warren Harding a member of the KKK?
This is what happens next, as told by the journalist Adrienne Jeffries in an article for The Outline, Google’s featured snippets are worse than fake news:
Shulman was taken aback. He confessed that he was not aware of that allegation, but that Harding had been in favor of anti-lynching legislation, so it seemed unlikely. But then a second student pulled out his phone and announced that yes, Harding had been a Klan member, and so had four other presidents. It was right there on Google, clearly emphasized inside a box at the top of the page.
Professor Shulman quite rightly asked publicly on Twitter: “how did such awful search results wind up featured on Google?” Does this question sound familiar? In my post “Are Women Evil? Hacking Google’s search results,” the journalist Carole Cadwalladr asked the same. And the aftermath of these questions played in the exact same way. Once this new episode became public, one more time Google changed something (that we don’t know) in its algorithms. Searching today for “presidents in the Klan” doesn’t show a featured snippet anymore, but a Wikipedia article.
For myself, the interesting question is always: how do such dubious websites manage to fool Google’s intelligent algorithms?
SEO Techniques to fool Google
This particular website, www.thetrentonline.com, describes itself as “Nigeria’s leading Internet newspaper”. This is not a fake news website, it was registered in 2013 in Nigeria and seems to contain many stories about Nigeria and Africa, as well as stories from the Internet. These stories have catchy titles and lots of photos, a technique used by many clickbait websites that make a living through online advertising. It’s impossible to know for sure why this article was picked by Google (given the opacity of its algorithms), but here are some of the Search Engine Optimization (SEO) techniques this particular website used.
- Repost content from others. I’m being generous and calling this technique reposting instead of stealing, because the article contains a reference to the original story. Reposting stories from other sources, by often embellishing them with catchy titles and photos, is a common SEO technique (because it’s cheaper than generating new content, and Google’s algorithms value meaningful content). The article about the Presidents in the Klan was originally posted in the blog, I Love Black People, and thetrentonline.com provides a link to it: http://iloveblackpeople.net/2014/07/five-us-presidents-were-members-of-the-ku-klux-klan/. However, this link is broken, but the story can be found in another link in the same website: http://iloveblackpeople.net/2016/03/five-us-presidents-were-members-of-the-ku-klux-klan/, reposted two years later.
- Appearance of fresh content. Searching the Nigerian website for the phrase “ku klux klan” displays nine times the article “REVEALED: 5 US Presidents Members Of Racist Cult Ku Klux Klan (PHOTOS)”, every time with a new date, starting in July 19, 2014, until the most recent one January 24, 2017. Reproducing the same article with different dates to show freshness of content is a typical SEO technique, but Google is supposed to not fall for this trick. Given that the last timestamp was only one month before Shulman’s class episode, it seems that this apparently “fresh” content did influence Google.
- Use of images with captions. The original post in the blog “I love black people” didn’t contain photos of the US presidents. The Nigerian website improved the original article by adding photos of four presidents. To add the photos in the article they used the special HTML5 tags: <figure> and <figcaption> (use of such tags indicates technical sophistication, also valued by Google’s algorithms). The text in these four figure captions was what Google extracted to display in its featured snippet. The inconsistency between number in the title (five) and presidents in the article (four) would have alerted a careful human editor about the questionable quality of the article, but an algorithm is not trained to make such judgements about facts.
- Tags for the article. The Nigerian website assigned several tags to the post: ku klux klan, ku klux klan members, racists US presidents, US presidents who were members of ku klux klan. Such tags reinforce the content of the article, but were not part of the original blog post, which tagged the article simply under news and politics.
This analysis makes clear one thing: using SEO techniques to create the appearance of a “carefully crafted” website, allows one to fool Google’s search algorithms and grab the coveted position zero as a featured snippet.
What should Google do?
The most desirable outcome would be for Google’s algorithms to capture the implicit topic of each query. For example, to a human, “presidents in the klan” sounds like a history question. Therefore, a human would value an answer that comes from a source that has some authority and expertise in history. One can argue that inferring the implicit topic of a query, given the ambiguity of natural language, is a difficult problem. I agree. Instead, my proposal is to provide the user with signals that would simplify the judgement of the source’s authority on this topic.
Concretely, if the featured snippet would have been augmented with a few signals about the source of this article, the students in Shulman’s class wouldn’t have been so quick in accepting Google’s answer. I have written about the “nutrition fact label” approach to augment search results in another post. In this particular case, it would have been helpful to show the following facts (because it would have likely put a student learning about the history of USA on guard):
- Location of website: Nigeria
- Category of website: Internet News
- Author: Wires | The Trent
Notice that these facts are judgement-free. If Google had to say that this content was stolen or that the website contains a lot of clickbait articles, that would have been debatable, because that is how most of the websites function nowadays. However, users either don’t know or forget that Google doesn’t create or endorse the content that it displays, though often finds it in the most bizarre corners of the Internet. This is why it should be responsible for telling us what it knows about the source, so that we don’t need to put our trust blindly in its search results, by assuming that Google can vouch for the source.