Analyzing Google News: Introduction

Greg Coppola
Jul 25, 2019 · 2 min read

I have been suspended from my job at Google for saying in an interview that I believe News and Search results have a political bias. I want to explore this question in a series of posts, using data science, with only publicly available information and tools.

We begin by replicating and extending an experiment run originally by Paula Boylard. I scraped Google News, searching for the query “donald trump”, once a minute, 5000 times. A scrape had 105 stories on average.

Power-Law Distribution Over Sites

We begin by looking at the distribution of publications (or web-sites) that make up our new Google/Trump corpus. In particular, we look at the probability that a randomly selected story comes from each given news site. The results are depicted here:

Note the use of a power-law (or 80/20, or rich-get-richer) distribution. The most-used site, CNN, is selected in 20% of all articles! In other words, even with the millions of sites on the Internet, 1 out of every 5 stories about “donald trump” from Google News is from CNN.

Cumulative Distribution

In power-law style, 50% of all stories come from the top 5 sites (CNN, USA Today, NYT, Politico, Guardian), and 83% of all stories come from the top 20.

To be continued…

Does this list of web-sites look politically neutral to you? We’ll explore further in a future post!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade