The Google Search Bubble

Keli
The Startup
Published in
7 min readFeb 14, 2021
Photo by Mitchell Luo on Unsplash

When a brand name is used as a verb, you know that brand has become ubiquitous in our lives. For many years now, “Googling” has become the primary way to retrieve information for many of us who lives in this highly digitalized society; and these search results we consume are shaping our world views. This blog post attempts to tell you how Google shapes our world views in problematic ways.

Problem 1. Perpetuating Racial and Gender Bias

The author of Algorithms of Oppression, Safiya Noble (2018), spent six years studying Google’s search algorithm and demonstrates that Google search results can reinforce and perpetuate gender and racial bias. She presented disturbing case studies where the keyword “black girl” prompted results encompassed websites and images of porn. Moreover, the search results constantly represent white people as the norm and standard while people of colour are depicted as the “others”. For example, when searched “white teenagers”, the image results showed positive wholesome depiction of white teenagers. On the other hand, when searched “black teenagers”, they were often depicted as gangsters.

While Google has improved a lot on those issues examined in the book, some deep stereotypes are still found from their search results today. When putting the keywords “boss at work” for Google Image Search, most of the resulting images represents “boss” as men. On the other hand, the keyword “secretary” not only prompted female-only images, some of them are explicitly sexual. These search results normalize two harmful narratives: men are dominant and authoritative; women are submissive and objectified.

Google Image Search results for “boss at work” (Feb 13, 2021)
Google Image Search results for “secretary” (Feb 13, 2021)

Problem 2. Google’s Filter Bubble

In the names of relevance and personalized experience, Google manipulates the search results according to your personal data it collected through your search history, browsing history, shopping history, location, and even your email.

Google Travel Help stating that emails about hotel reservations in user’s Gmail account is used for search result manipulation.

In other words, the information you are exposed to is what Google thinks you will prefer. As a result, point of views that are different than yours would be excluded in the search results, and we only see and hear what we like. Internet activist Eli Pariser calls this a filter bubble, which is a similar idea to “echo chambers”. The filter bubble’s effects are the most apparent when it comes to political ideologies, as when people are searching political topics, they are more likely to be presented with search results that are conforming with their existing political views. As a result, these existing political views are reinforced and even more polarized by the information presented.

Problem 3. Anticompetitive Practices

Google has dominated the search engine market for the last decade. As of January this year, Google holds an 91.86 percent Search Engine market share world wide (Statcounter, 2021). Although the majority of Google’s revenue comes from advertising, it has expand its markets to products and services that we also heavily depend on such us Gmail, Google Drive, Google Maps and many more.

Google has around 92% of the search engine market share world wide. Source: Statcounter

October 20, 2020, the U.S. Justice Department filed an antitrust lawsuit against Google. The U.S. Justice Department lawyers accused Google of abusing its dominance in search engine market to enrich its own business empire and maintain its monopoly status. The Markup also showcases that top Google search results are often Google products. Inspired by The Markup’s work, I conducted an experiment to search for “flight Toronto” in four different search engines — Google, Bing, Yahoo and Duckduckgo.

Case Study: Searching “flight Toronto”

When I searched “flight Toronto” with Google Search, the first two results are ads from Air Canada and Flighthub. Right after the ads is the integrated interface of Google Flights, then is the real search results. What’s wrong with offering another Google service within the Google Search platform?

Google Search results for “flight toronto” (Feb 13, 2021)

To see the wrongdoing of this practice, we have to consider a couple things —results presented first have higher chance to be clicked; a service that is presented in your face without a “click” has a higher chance to be used. Google Flights’s integrated interface is placed before the first organic search result. And although it’s placed after the ads, it doesn’t require user to click on a link. This is a way that Google can stifle competitors subtly but effectively. As The Markup stated, the effects of placing Google’s own products on the search page are stark:

In the nine years since Google Flights and Google Hotels launched, those sites have become market leaders. They garnered almost twice as many U.S. site visits last year as each of their largest competitors, Expedia.com and Booking.com, even though we found Google Flights doesn’t always show users all the options.

Results On Other Search Engines

I input the same keywords for three other search engines — Bing, Yahoo and Duckduckgo — and scraped the search results (method can be found in the last section). All three engines rank Expedia as the first result in the searches and none of the engines rank Google Flights in the top 5. However, by placing the interface directly in Google Search, Google Flight establishes a status of a “default” service to its massive amount of users and impacts the online travel industry abruptly.

The first page of Bing search results: Google Flights is ranked at 6th and 10th (Feb 13, 2021)
The first page of Yahoo search results: Google Flights is ranked at 6th (Feb 13, 2021)
The first page of Duckduckgo search results: Google Flights is ranked at 9th and 10th (Feb 13, 2021)

The “Default” Bubble

It’s important to note that racial and gender bias exist not only in Google Search, but other search engines as well. The filter bubble is also not unique Google but also to Facebook and other platforms that operate on personalized algorithms.

Like Google Search, Duckduckgo’s image search results of “secretary” also show female-only representations and some are explicitly sexual.

What is unique to Google is its “default” status of not only one, but several essential services and products such as Google Search, Gmail and Google Maps. As U.S. Justice Department lawyers describe, Google has a role of the gatekeeper of the Internet, and such power is being abused to enrich their business empire by expanding the sense of default for other Google products in unfair ways. It’s really important to know that such conduct not only hurts Google’s competitors, but also hurts us as consumers as our cognitive mechanism is exploited to favour choices from a single source.

Method: Search Results Scraping by Scrapy-Splash

Discliamer: Since I am a beginner to web scraping, I might not be aware of more efficient ways to scrape search results. If you know a better way, please share!

Data Collection: Create Scrapy Spiders and Get Dynamic Content Through SplashRequest

To see the search results of “flight Toronto” in other non-Google search engines, three Scrapy Spiders were created to scrape first-page search results of same keywords on Bing, Yahoo, and Duckduckgo. Because the web pages are dynamically generated by JavaScript scripts which are executed by browsers. Because Scrapy is not a browser, it can not handle JavaScript scripts and the content would be empty. Therefore, Splash is used to handle JavaScript rendering.

Three separated Spiders for the three search engines are created respectively. Only the code for the Duckduckgo Spider is shown here, since all Spiders share a similar logic.

The Spider gathers the title, domain, and the index of each non-ad search result and saves it as a csv file searches_ddg.csv. The csv file is then loaded into a Python note book by Pandas.

import pandas as pd
sr_ddg = pd.read_csv("searches_ddg.csv")

The data frame looks like this:

Data Preprocessing

The last column result_index is zero-based, which means the initial result of the sequence is 0 rather than 1. For more a intuitive reading, I change it to one-based number by add each index with 1 and create a new column called result_rank.

sr_ddg["result_rank"] = sr_ddg["result_index"] + 1
sr_ddg.drop(['result_index'], axis = 1)

The final data frame looks like this:

Other keywords will be explored in future works. I have tried “hotel Toronto” and observed that Google places an integrated interface of Google Hotels in before the organic search results as well. If you have a suggestion for keywords to be explored, don’t hesitate!

--

--

Keli
The Startup

Writing the good, the bad, and the ugly about data. | kelichiu.com