Is Google really trustworthy as it is deemed to be?

Photo by Mitchell Luo on Unsplash

On October 20, 2020 the Department of Justice along with eleven other attorneys filed an antitrust lawsuit to stop Google from unlawfully creating a monopoly. After being sued this way, Google has a long way to go ahead to prove that it does not dominate the internet world through unfair means. Google was accused in the long running lawsuit of harming it’s competitors in the Internet search and advertising through unlawful contracts and agreements. According to this claim, Google was supposedly paying millions of dollars to other companies to prioritize it’s search engines in their products and several other restrictions that brought up its search tool up on the front whenever users encountered the Web.

The filing of this lawsuit is just the first step in a long battle. Google has aggressively declined the allegations stating that the case was “deeply flawed” and no one would gain anything out from it.

This Google lawsuit is not quite an ambitious attempt to modify the boundaries of the currently existing anti trust law that most of the critics had long hoped for. There has been a broad spectrum of responses regarding the lawsuit. This lawsuit comes decades after Google turned it’s search engine as a centerpiece of Internet Search. Hence questions are raised if a single lawsuit can loosen Google’s grip on the Internet world.

Photo by Tingey Injury Law Firm on Unsplash

The case is one of the most aggressive actions US Department of Justice has taken in decades against any tech giant. It would also be the first lawsuit Google has faced from US Government despite years of investigations into company’s legal regulations. Based on this, we can say that Google is facing one of the biggest antitrust lawsuits of the generation.

Taking this into consideration, I choose to perform my graph and content mining analysis based on the tweets related to this lawsuit. This analysis will allow us to see reactions from common people and those from politicians and government officials through the lens of an analyst.

With more than 500 million tweets per day on Twitter, it’s not incorrect to assume what a rich source of information it is. The objective of this post is to reflect on gather and analyze event related Twitter data to discover interesting information and hidden patterns. From minute to minute trends to general discussions and topics, Twitter is a great source of data to study an event. Could we track an event and determine what people are thinking? Is it possible to run sentiment analysis on what the world is thinking as an event unfolds over time? Can we track Twitter data and see people’s reactions on an event? These are some of the questions which can help to analyze an on going event. This post briefly explains the use of graph mining for the analysis of event related tweets.

Data Cleaning and Pre Processing

Following are some of the pre processing tasks that I performed:

1) Tokenization using NLTK library

Tokenization is the process of breaking down streams of text into tokens like words, phrases or symbols. For this reason I used the TweetTokenizer class to appropriately tokenize the Twitter content.

2) Stop Word Removal

Stop words do not have content and hence we need to remove them. This category of words include articles, adverbs, symbols, etc.

Data Cleaning to remove stop words from the tweets

3) Normalization

Normalization is used to aggregate different terms in the same unit. Performing this kind of normalization helps in matching same strings with different casing so that they can be aggregated under same category.

Now,

In this section, I’ll discuss on the approach used to determine the most frequently occurring words in the tweets related to the lawsuit:

Explore Co-occurring Words (Bigrams):

Some English words occur together more frequently. For example — Sky High, do or die, best performance, heavy rain etc. So, in a text document we may need to identify such pair of words which will help in sentiment analysis. To identify the co-occurrence of words in the lawsuit related tweets, I used bigrams from NLTK. The bigrams within a sentence are all possible word pairs formed from neighboring words in the sentence.

The following image shows a list of bigrams contained in the collected tweets:

Most frequently co-occurring list of words

After performing cleaning on these generated bigrams from the tweets, I generated a list of top 20 most frequently occurring bigrams as follows:

Top 20 most commonly occurring collection of bigrams

Visualize the Network of Bigrams:

We can use the previously generated Pandas dataframe to to visualize the top 20 occurring bigrams as Network using the Python package NetworkX.

Network visualization showing top 20 bigrams

Once the visualization of top 20 most common bigrams is obtained, we use Python package Textblob to calculate the polarity values of individual tweets on Google Antitrust Lawsuit.

Polarity scores for individual tweets

These polarity values can be plotted in a histogram which can help to highlight the overall sentiment whether the it is inclined towards more positivity or negativity towards the subject.

Histogram showing distribution of polarity scores

However, the above histogram contains mixture of tweets whose polarity values are zero, positive or negative. In order to get a clear idea about the sentiment revolving around the lawsuit, I choose to remove those tweets whose polarity value equals to zero and plot a histogram with the remaining tweets as shown below:

Distribution of non zero polarity scores

As we can infer from the above histogram that the value of polarity for majority of tweets falls in the range of negative values.

Final Comments

With so much ink spilled on the subject of monopoly and reining in Big Tech, so many government bodies with so many complaints, the antitrust lawsuit filed by the US Department of Justice against Google leads us to thinking that will this lawsuit lead to the holy grail granting consumers rights to their data? Well, for this we are looking forward to a regulatory regime down the road, but keep your expectations in check before the lawmakers announce something big!

References:

  1. https://www.justice.gov/opa/pr/justice-department-sues-monopolist-google-violating-antitrust-laws
  2. https://www.justice.gov/opa/pr/statement-attorney-general-announcement-civil-antitrust-lawsuit-filed-against-google
  3. https://www.tweepy.org/
  4. https://textblob.readthedocs.io/en/dev/
  5. https://networkx.org/documentation/stable/index.html

--

--