The Google Search Bubble
When a brand name is used as a verb, you know that brand has become ubiquitous in our lives. For many years now, “Googling” has become the primary way to retrieve information for many of us who lives in this highly digitalized society; and these search results we consume are shaping our world views. This blog post attempts to tell you how Google shapes our world views in problematic ways.
Problem 1. Perpetuating Racial and Gender Bias
The author of Algorithms of Oppression, Safiya Noble (2018), spent six years studying Google’s search algorithm and demonstrates that Google search results can reinforce and perpetuate gender and racial bias. She presented disturbing case studies where the keyword “black girl” prompted results encompassed websites and images of porn. Moreover, the search results constantly represent white people as the norm and standard while people of colour are depicted as the “others”. For example, when searched “white teenagers”, the image results showed positive wholesome depiction of white teenagers. On the other hand, when searched “black teenagers”, they were often depicted as gangsters.
While Google has improved a lot on those issues examined in the book, some deep stereotypes are still found from their search results today. When putting the keywords “boss at work” for Google Image Search, most of the resulting images represents “boss” as men. On the other hand, the keyword “secretary” not only prompted female-only images, some of them are explicitly sexual. These search results normalize two harmful narratives: men are dominant and authoritative; women are submissive and objectified.
Problem 2. Google’s Filter Bubble
In the names of relevance and personalized experience, Google manipulates the search results according to your personal data it collected through your search history, browsing history, shopping history, location, and even your email.
In other words, the information you are exposed to is what Google thinks you will prefer. As a result, point of views that are different than yours would be excluded in the search results, and we only see and hear what we like. Internet activist Eli Pariser calls this a filter bubble, which is a similar idea to “echo chambers”. The filter bubble’s effects are the most apparent when it comes to political ideologies, as when people are searching political topics, they are more likely to be presented with search results that are conforming with their existing political views. As a result, these existing political views are reinforced and even more polarized by the information presented.
Problem 3. Anticompetitive Practices
Google has dominated the search engine market for the last decade. As of January this year, Google holds an 91.86 percent Search Engine market share world wide (Statcounter, 2021). Although the majority of Google’s revenue comes from advertising, it has expand its markets to products and services that we also heavily depend on such us Gmail, Google Drive, Google Maps and many more.
October 20, 2020, the U.S. Justice Department filed an antitrust lawsuit against Google. The U.S. Justice Department lawyers accused Google of abusing its dominance in search engine market to enrich its own business empire and maintain its monopoly status. The Markup also showcases that top Google search results are often Google products. Inspired by The Markup’s work, I conducted an experiment to search for “flight Toronto” in four different search engines — Google, Bing, Yahoo and Duckduckgo.
Case Study: Searching “flight Toronto”
When I searched “flight Toronto” with Google Search, the first two results are ads from Air Canada and Flighthub. Right after the ads is the integrated interface of Google Flights, then is the real search results. What’s wrong with offering another Google service within the Google Search platform?
To see the wrongdoing of this practice, we have to consider a couple things —results presented first have higher chance to be clicked; a service that is presented in your face without a “click” has a higher chance to be used. Google Flights’s integrated interface is placed before the first organic search result. And although it’s placed after the ads, it doesn’t require user to click on a link. This is a way that Google can stifle competitors subtly but effectively. As The Markup stated, the effects of placing Google’s own products on the search page are stark:
In the nine years since Google Flights and Google Hotels launched, those sites have become market leaders. They garnered almost twice as many U.S. site visits last year as each of their largest competitors, Expedia.com and Booking.com, even though we found Google Flights doesn’t always show users all the options.
Results On Other Search Engines
I input the same keywords for three other search engines — Bing, Yahoo and Duckduckgo — and scraped the search results (method can be found in the last section). All three engines rank Expedia as the first result in the searches and none of the engines rank Google Flights in the top 5. However, by placing the interface directly in Google Search, Google Flight establishes a status of a “default” service to its massive amount of users and impacts the online travel industry abruptly.
The “Default” Bubble
It’s important to note that racial and gender bias exist not only in Google Search, but other search engines as well. The filter bubble is also not unique Google but also to Facebook and other platforms that operate on personalized algorithms.
What is unique to Google is its “default” status of not only one, but several essential services and products such as Google Search, Gmail and Google Maps. As U.S. Justice Department lawyers describe, Google has a role of the gatekeeper of the Internet, and such power is being abused to enrich their business empire by expanding the sense of default for other Google products in unfair ways. It’s really important to know that such conduct not only hurts Google’s competitors, but also hurts us as consumers as our cognitive mechanism is exploited to favour choices from a single source.
Method: Search Results Scraping by Scrapy-Splash
Discliamer: Since I am a beginner to web scraping, I might not be aware of more efficient ways to scrape search results. If you know a better way, please share!
Data Collection: Create Scrapy Spiders and Get Dynamic Content Through SplashRequest
To see the search results of “flight Toronto” in other non-Google search engines, three Scrapy Spiders were created to scrape first-page search results of same keywords on Bing, Yahoo, and Duckduckgo. Because the web pages are dynamically generated by JavaScript scripts which are executed by browsers. Because Scrapy is not a browser, it can not handle JavaScript scripts and the content would be empty. Therefore, Splash is used to handle JavaScript rendering.
Three separated Spiders for the three search engines are created respectively. Only the code for the Duckduckgo Spider is shown here, since all Spiders share a similar logic.
The Spider gathers the title, domain, and the index of each non-ad search result and saves it as a csv file searches_ddg.csv. The csv file is then loaded into a Python note book by Pandas.
import pandas as pd
sr_ddg = pd.read_csv("searches_ddg.csv")
The data frame looks like this:
Data Preprocessing
The last column result_index is zero-based, which means the initial result of the sequence is 0 rather than 1. For more a intuitive reading, I change it to one-based number by add each index with 1 and create a new column called result_rank.
sr_ddg["result_rank"] = sr_ddg["result_index"] + 1
sr_ddg.drop(['result_index'], axis = 1)
The final data frame looks like this:
Other keywords will be explored in future works. I have tried “hotel Toronto” and observed that Google places an integrated interface of Google Hotels in before the organic search results as well. If you have a suggestion for keywords to be explored, don’t hesitate!