Are Google’s Top Stories Politically Biased? It’s Complicated.

An Algorithm Audit of Top Stories During the Confirmation of Brett Kavanaugh

Figure 1: The “Top stories” results for the query “kavanaugh fbi investigation” on Nov 3, 2018, 8am. On the left-side image, the first two are right-leaning publications and the third source is Russia Today (RT.com). Two hours later on the same day (right-side image), the Top stories panel had a new format with two new sources and a different ranking order.

In August 2018, President Trump tweeted the following about Google:

Google search results for “Trump News” shows only the viewing/reporting of Fake News Media. In other words, they have it RIGGED, for me & others, so that almost all stories & news is BAD. Fake CNN is prominent. Republican/Conservative & Fair Media is shut out. Illegal? 96% of results on “Trump News” are from National Left-Wing Media, very dangerous. Google & others are suppressing voices of Conservatives and hiding information and news that is good. They are controlling what we can & cannot see. This is a very serious situation-will be addressed!

This tweet was informed by an article published on PJ Media, titled “96 Percent of Google Search Results for ‘Trump’ News are from Liberal Media Outlets”. After its publication, various sources, including the author, made it clear that it was not a scientific study with generalizable results, but a one-time snapshot observation. Although this particular inquiry was methodologically flawed, its premise is worth investigating: what sources does Google show when one searches for events or people in the news? Is there anti-conservative bias on Google?

With my students at Wellesley College, I am working on these questions from multiple perspectives. In this blog post, I will discuss one case study that focuses on the Top stories shown in Google search during Brett Kavanaugh’s confirmation hearings. Here is a summary of what we learned:

  1. Fox News was the source with the largest number of unique stories in our (two month long) data collection. It was also the only conservative news outlet in the top 10 most prevalent sources.
  2. Fox News stories were more likely to be in the first position in the Top stories than any of the other media outlets.
  3. The competition for the 1st position in Top stories is “fierce”, with only 21% of sources being able to secure the position. Meanwhile, the 3rd position is up for grabs frequently, 59% of sources were found at least once in that position.

What do these results suggest? A simple interpretation is that Kavanaugh’s confirmation was the biggest news story of the season and Google users across the political spectrum were extremely engaged and thirsty for updates. While non-conservative readers have many options to access up-to-date news, Fox News is the dominant conservative outlet, as a Pew study found out in 2014. By choosing Fox News more often, Google’s algorithm was most likely reflecting the popularity and authority of this news source in this particular news event. However, one conspiratorial interpretation of the strong showing of Fox News would be to suggest that Google has been actively promoting conservative content to balance for the higher number of center and liberal sources. Thus, one could claim that Google is displaying a conservative bias, the opposite of what many people, including President Trump believe. Can Google display conservative bias in some stories and anti-conservative bias in others? Nothing is impossible. We believe that researchers need to pursue several high-interest news stories over an extended period of time, in order to better understand whether the bias comes from the algorithm or reflects publisher and reader choices.

If you are curious about how we came to these findings, please read on.

Auditing Algorithms

How can we know whether Google has a political bias? Given that we cannot look inside Google’s “black box” algorithms, researchers have proposed a method called algorithm auditing. We performed a “scraping audit” by sending queries to the Google search platform and recording the results. However, there are several important variables to consider when auditing Google search results:

The query phrase. Slight variations in the query phrase will alter the results, despite Google’s herculean efforts to “understand” the intent of the query. Concretely, as humans, we can agree that the queries “cruz beto” and “beto vs cruz” have the same intent (look up information about both candidates in the Texas 2018 US Senate race), and while the algorithm displays similar results, they are not identical (Figure 2).

Figure 2: Results for two similar queries: “cruz beto” and “beto vs cruz”. Two of the three stories are the same, but the order for the 1st and 2nd one differs.

The location. For queries such as “weather” or “movies”, Google uses our current location (inferred from the IP address of the device) to show the local weather or movies playing near us. This works for other queries as well. For example, my search for “what’s on the ballot”, showed in the third place a link to my local government, despite the lack of explicit geographical location on the query (Figure 3).

Figure 3: A screenshot that indicates how Google search uses a the device location to provide location-specific results.

Individual search history. Google’s algorithms learn over time our informational preferences and adjusts themselves to better adapt to these preferences. This is known as personalization, but another name for it is the “filter bubble”, a term popularized by Eli Pariser in his 2011 Ted talk, in which he shows side-by-side examples of two searches for Egypt (performed by two of his friends), with one page containing news about the protests in Egypt and the other with no mention of them.

Device type. Mobile phones and laptop/desktop computers are different (internet speed, page loading time, screen size, modes of interaction, etc.), thus when creating a list of results for different devices, Google takes these factors into account.

Google products. Search results about news events depend on the Google product being used. PJ Media’s author searched the News tab of Google search. Another Google product people use regularly is Google News. Meanwhile, Google’s traditional search also shows news stories in a carousel format at the top of the page, labeled as Top stories. This feature has become particularly popular on mobile phones, increasing referrals from Google search to publishers.

Researchers who audit Google search results, typically recruit geographically diverse participants to study how location, personalization, or query differences influence the search results. However, they have found that personalization might not play as big a role as previously suspected. In our study, we are interested in the baseline behavior of the algorithm for ranking news sources (without the adjustments it makes for location and personalization), therefore we keep many of these variables constant. We use the same laptop computer, with the same IP address, and the same blank-state browser in incognito mode (one device, one location, and no search history personalization). We programmatically send the same queries repeatedly to Google search and then capture the composition of the Top stories panel. This method allows us to perform a temporal analysis of the Top stories ranking algorithm.

A case study: Kavanaugh’s Top stories

For the remainder of this post, I will focus on one of the biggest political events of 2018: Brett Kavanaugh’s confirmation to the U.S. Supreme Court. Over a period of 8 weeks (Sep 5 — Nov 3), we tracked the following queries multiple times a day and almost always observed the presence of the Top stories panel: brett kavanaugh (342 observations), kavanaugh hearing (342), kavanaugh confirmation (342), kavanaugh vote (227), kavanaugh (278), kavanaugh fbi investigation (157). Some queries have fewer observations because they were added later to our set of queries. Each observation consists of 3–10 stories that appeared in the Top stories panel (3 for lower-ranked Top stories, such as the right-side example in Figure 1) and 10 for the carousel-style Top stories (left-side example in Figure 1). We discovered a total of 3,708 news stories (with unique URLs) created by 332 news outlets¹. Here are some of the results of our analysis.

Top 10 Most Represented Sources (by number of unique stories)

This list is composed of the largest cable and TV networks, national newspapers, and online publishers that cover political issues. Together, these 10 sources contributed 48% of all stories shown during this period. Thus, publishers that produce more news stories appear more frequently on Top stories.

According to AllSides (a nonprofit that uses experts and the community to label bias in media), Fox News is the only right-leaning source among the sources in our list. Should more right-leaning sources be on this list? Outside the top 10, we found several right-leaning sources: The Washington Examiner (13th overall) with 57 stories, Wall Street Journal (16th) with 52 stories, and several other sources clumped together in 20th position: Breitbart (41 stories), New York Post (41 stories), and National Review (40 stories). What factors determine the order? What percentage of each publisher’s stories appear on Top stories? We’ll address these new questions in our future work.

Which publishers appear most often in the Top 3 positions in Top Stories?

Although the Top stories panel is a carousel of 10 stories, users initially see three of them. Thus, the first three visible positions constitute a “first impression” that likely influences readers perception of the evolving story. So, which news outlets are displayed in the top three positions?

  • 1st position: 70 sources (out of 332)²
  • 2nd position: 94 sources (out of 332)
  • 3rd position: 195 sources (out of 332)

The major takeaway here is that in a crowded space of publishers, some have a better chance than others to “secure” the 1st position. Here is a look at the publishers who did so most often (in terms of the number of unique stories):

  1. Fox News — 34% (113 out of 330 stories)
  2. CNN.com — 29% (80 out of 282 stories)
  3. The New York Times — 28% (35 out of 126 stories)
  4. NBC News — 19% (25 out of 133 stories)
  5. Washington Post — 15% (30 out of 206 stories)

It seems that when it comes to placement on the 1st position, the number of stories published doesn’t weigh the most. The New York Times and NBC News both had fewer stories than USA Today, The Hill, and Vox, but were chosen more often for the 1st position. Most likely having a bigger audience (which is correlated to name recognition) matters more. Is the algorithm rewarding sites that are already popular? It seems that way. Kavanaugh’s confirmation was the biggest story in the U.S for more than a month, thus its coverage was non-stop, as it was the interest of the public. Therefore, the algorithm might be rewarding popular sources, according to the “exploitation-exploration” tradeoff (see next section). However, this is only an hypothesis and we’ll need to study many events like this one to be able to generalize.

Position 3 is the “Exploration position”

As noted above, 195 out of 332 sources appeared at least once in the third position. Compared to the to the 1st position (only 70 sources appeared), the sources in the third position are much more varied. An example of that was RT.com (the Russian propaganda publisher) in Figure 1, but here are a few examples from other outlets which don’t usually publish politically-related content that appeared in the 3rd position:

  • CBR (Comic Book Resources): Samuel L. Jackson Reacts to Pulp Fiction/Brett Kavanaugh Mash-Up
  • Billboard (music charts): Sara Bareilles Reveals She Released Feminist Anthem ‘Armor’ Early Because of the Kavanaugh Hearings
  • TMZ.com (gossip tabloid): Marco Rubio Drowned Out by Protesters While Talking Brett Kavanaugh

In machine learning, this strategy is known as the exploitation-exploration tradeoff. Agents (humans or algorithms) typically exploit what they already know (for example, we always read news from the same 2–3 sources), but once in a while agents try something new (exploration). The data in this case study seems to suggest that Google’s algorithm is using the 3rd position in Top stories as the “exploration position” to 1) encourage users to engage with unfamiliar news sources or perspectives and to 2) learn from their behavior which sources are worth “promoting”.

So, is Google biased?

We found that Fox News (a conservative publisher) had the largest number of stories and the largest number of 1st positions in Google’s search panel of Top stories. The other sources, while not conservative, were a mix of very popular center or left-leaning sources. This particular case study indicates that publishers and audiences play an important role in shaping the composition of the Top stories, maybe more so than any political bias. If anything, Google might be actively trying to burst the “filter bubble” by ensuring that a wide range of sources are displayed in the 3rd position.

We’ll continue monitoring Google’s Top stories and share more findings. If you have questions or requests please leave a comment.

Footnotes

[1] Automatically, our program found 332 unique names in the source field of an article. However, an inspection of this list revealed that several sources are listed with different names. For example, we found both The Huffington Post and HuffPost, which are the same organization, or Hot Air and HotAir. As a result, the number 332 is an overestimation. What is the source of such errors? It is not clear to us whether the publishers or Google’s algorithms choose the names that appear in these stories. We will try to find out in future work.

[2] As mentioned in [1], these numbers are overestimations, because we didn’t collapse duplicates into a single source. We only did this for the top 10 sources. Concretely, Fox News contains also the stories for Fox News Insider and Fox Business. CNN and CNN.com were merged together, as well as USA TODAY and USA Today, Huffington Post and HuffPost, NBC News and NBCNews.com.

Acknowledgement

I’m very grateful to my research collaborator Emma Lurie for great discussions and suggestions to make this post clearer.