An Actual Statistical Analysis of #GamerGate? UPDATED

Taylor Wofford of Newsweek recently published this article stating that GamerGate is a harassment campaign. (NOTE: I misspelled Wofford’s name as “Wafford.” This was unintentional; I do apologize for the error.)

NOTE: I have added a companion piece to this where I actually refute some arguments against this piece and do inferential analysis. It is now appended below as well.

The crux of this argument is as follows:

But an analysis by Newsweek found that Twitter users tweeting the hashtag #GamerGate direct negative tweets at critics of the gaming world more than they do at the journalists whose coverage they supposedly want scrutinized.

Let’s look at the methodology.

Newsweek asked BrandWatch, a social media analytics company, to dig through 25 percent of the more than 2 million tweets about GamerGate since September 1 to discover how often Twitter users tweeted at or about the major players in the debate, and whether those tweets were positive, negative or neutral.

The first issue present here is that they asked the company to look at 25% of tweets. The standardization of aggregation is not identified. Which tweets were selected? Whose tweets were selected? In what frequency?

None of these questions are answered. This presents the first methodological problem. There is no way for anyone to replicate these findings. The reader is thus to trust that Newsweek and BrandWatch did a quality job in their analytics. So let’s address trust.


It would appear that Newsweek asked BrandWatch to only examine tweets at particular people utilizing the GamerGate tag at the same time. This can be seen in their bar graph image where only select people were examined (tweets in parentheses as detailed in the source): Anita Sarkeesian (35,188), Zoe Quinn (10,700), Brianna Wu (38,952), Kotaku (23,500), Leigh Alexander (13,296), Nathan Grayson (732), Stephen Totilo (1,708).

Of note is that the article uses terms like “pummeled” and “bombarded.” These are the first terms that lead one to believe the article is in bad faith as it is using specific words to engender emotional support for women.

Let’s look at these numbers in actual terms. 124,076 tweets were sent to these people from September 1 to an undisclosed date according to Wofford’s work.

Starting from September 25th to October 25th, there have been 1.8 million uses of the #GamerGate tag according to Topsy. Assuming an average of 30,000 tweets per day, this means that we can subtract approximately 700k tweets to get us to September 1 to October 1—1.1 to 1.3 million tweets, give or take, were made using GamerGate as a tag.

This means that of the 1.1 million tweets between September to October 1, 124,076 were tweeted to these persons.

Fewer that approximately 10% of #GamerGate tweets were aimed at Sarkeesian, Quinn, Wu, Kotaku, Alexander, Grayson, or Totilo.

In the timeframe of September 25 to October 25, 1.8 million tweets were sent. According to topsy, removal of ethics, ethic, journalism, journal, journo, reform, fairness, ethical, notyourshield, and #notyourshield returns 1.56 million tweets.

This means that there were 240,000 tweets about these subjects in this time. If absolute count betrays the importance of the tag then this should tell us that #GamerGate as a tag is more concerned with ethics, journalism, reform, fairness, and minorities than it is with these identified people.

After all, 10% of tweets were to individuals. 20% were to concepts. The concepts should, logically, matter more than the individuals, right?

There simply is not a good faith effort here to trust that Wofford and Newsweek are being honest in representing methods and the information in an objective, trustworthy manner.


The second problem present is the nature of tweet sentiment. Newsweek links directly to the graph where all agents are less than 5% positive mention and 5–10% negative mention.

Source: Newsweek

This means that 90–85% of mentions are neutral—neither positive nor negative. Not hateful or helpful. They are just tweets that are neutral.

However, Wofford just waltzes by this fact that most of these mentions, responses, or thoughts are neutral. This graphic alone gives major evidence that GamerGate is not anti-woman. It’s very much neutral to gender as persons of both gender are mentioned positively, neutrally, and negatively.

Wofford also completely ignores that Nathan Grayson receives near to no positive mention and one of the highest negative mention rate among all participants. If the point is hatred against women, why is it that a male receives no positives and a lot of negatives? Why is it the person with the next highest frequency of either is also a male, Totilo? Why is the highest positive mention Leigh Alexander who a woman?

Why are the lowest negative measurements Quinn, Sarkeesian, and Alexander?

All of these un-examined questions leave us to wonder, as readers, if Wofford read his or her own data as the data actively contraindicate the conclusion. The data show that the movement is not sexist and is not hostile. It’s neutral and criticizes men and women alike.

Should Wofford ever stumble outside of the his or her box, I’d like to know why he or she elected to ignore these points of data. Doing research of any nature should identify pros and cons. Not just use loaded words and ignore points of data. Especially not when datum on Grayson and Totilo belie the argument of sexism aimed at women over men.

After all, sexism is hatred or discrimination against someone based purely on gender. If this is the case, and men receive more negative opinions proportionately than women, is this not a gendered distribution of negativity toward men?

Of note is that Wofford offers an absolute tweet measurement graph as well:

Source: Newsweek

Once again, a vast majority of tweets that any participant receives are neutral or positive. The largest amount of negative tweets, to Wu, numbers just under 1,000 negative tweets of nearly 9,000 tweets for roughly 10% of tweets.

Readers are to believe that approximately 10% of negative tweets are more important than 90% of positive or neutral tweets. The data, however, show that all sources receive more positive or neutral tweets than negative. The most numerous of positive tweets are also to a woman—Brianna Wu. All of the women receive more absolute positive statements than any of the men.

If that’s the case, how can one conclude that GamerGate is about harassment more than praise and exchange of ideas?

It is also worth noting that these measurements are all sentiment measurements. Sentiment is “opinion of.” It never measures intent or motivation. Statement that the group of people hate or dislike others through negative sentiment is completely unfounded as sentiment does not measure belief or attitude. It categorizes opinions about something based on the assumed linguistic motivation of terms.

Anyone who works with people knows that someone can hate something and speak positively of it. Politics are built upon this foundation.


Should you be presented with the Newsweek story, please make sure to ask about those neutral mentions that vastly outnumber any other mention scale. Ask why we must ignore those neutral mentions just for negative mentions.

Why are we being told by Wofford and others to ignore 85–90% of 25% of tweets for 5–10% of 25% of tweets? Why should 5–10% of 25% of tweets be more important than the vast majority of 25% of positive or neutral tweets?

Why are we supposed to conclude that GamerGate as a whole is a movement of harassment based on 25% of tweets where 90–95% are not negative?

Why are we supposed to conclude that tweeting to women at all, in support or neutrally, is a sign of harassment? Are people not allowed to have discussions on social media with women because doing so is harassment? If so, why is that the case? Are we really suggesting that women cannot handle a larger amount of tweets than men?

These are important questions to ask. They’re critical questions, and they are necessary when a reporter and news outlet actively engage in ignoring their own data for a narrative.

UPDATED: Combined Companion Piece Below

A Statistical Analysis of #GamerGate Utilizing Newsweek Data


This Storify popped up. Let’s handle some issues here and now that Zennistrad states, “Skimming through the Medium post, a lot of it smelled like bullshit.”

I am not GamerGate. I am a person. Zennistrad wishes to pretend that I am an entire group. I am not. Association of me as a person to a group is a generalization fallacy. Zennistrad should stop doing so as it is dehumanizing and insulting.

Now, let’s first start with @App_of_Self who states:


“It’s not statistics. It’s using Topsy (which had nothing to do w/ the actual newspaper article) in whatever way they see fit. 1

Without givin their methodology or an explanation what Topsy has to do with the work that PROFESSIONALS did at BrandWatch 2"

BrandWatch and I both engaged in statistical analysis as in organization and description of data. They employed data preparation via gathering the data and sorting it from tweets. I did it by examining aggregated data from a website. Both are forms of statistical analysis.

As for what Topsy “has to do with the work,” it has to do with getting data from a source. Just like BrandWatch is a source for the Newsweek.

Statistical analysis results from data analytics where data is collected, described, explored, hypothesized, and used in predictive manners. This is a three step process through which data is gathered, organized, described, tested, and modeled through hypothesis testing. However, not all analysis goes through these steps depending on the desired outcome and statistical need.

Statistics come in two forms: descriptive and inferential statistics. Nothing done by BrandWatch falls under inferential statistics which seek to conclude from data points. They are, in sum and total, descriptive. They describe a phenomenon, they do not establish the quality, kind, or significance of the phenomenon.

This conclusion is made by Newsweek’s writer without statistical testing, hypothesis, or even standardization. As such, I did not engage in this process either. Instead, I stuck to examination of the data on the descriptive level.

Companies do not always do quality, professional work either.

Claire Robsahm, a transgender developer, tweeted a series of tweets.

This guy isn’t very good at statistics. I can explain more in depth later.

Basically, it doesn’t matter that the majority were neutral. That’s not what the study was about.

Claire here essentially says that it does not matter that the majority of the presented data show that opinions about the persons were neutral when discussing if people are harassing.

“It was about harassment and negativity. The neutrals just serve to passively support the negatives or positives in any group.”

Of course, do note that Claire has yet to even discuss statistics. That would be because Claire is abundantly well aware that she has no foundation in statistics. Instead, Claire has to settle on discussing the descriptive argument instead of the inferential failures of the author in deducing without statistical testing.

So let’s employ some critical thinking skills as Claire basically told us:

It does not matter if people are favorable or neutral because the issue is about negativity and neutrality is passive support of negativity.

“Which this “analysis” doesn’t really mention. His bit about who receives the most harassment also makes little sense.

Like, my biggest problem here is that he talks about how Greyson received more negativity per capita than any of the women.

He neglects to mention that Greyson receives the smallest amount of mentions AT ALL.

Per capita can be a very important thing, but in this case, it’s not as relevant as this individual gives it credence.”

What Claire does not discuss is the problem with focusing on per capita information. This is where Claire, had she understood statistics, could know why we use proportional data with absolute data.

Absolute data is typically skewed to populational extremes. Take for example income information. Most people make about $50,000, give or take. However, there are some people who make $100,000,000,000. What each person spends on will be different. The smaller amounts of money must spend more money on necessary items like food, water, etc. So, we look at both the absolute data and the proportional data.

Another way of thinking about it is crime and arrests. Whites are a larger population group and commit more crimes when looking at absolute crime data. However, when adjusting for population, we find that Blacks tend to account for more crime in terms of rates. Which group does more crime? Well, the answer is, “It depends on the measure.”

Whites account for more absolute crime (6.5 million of 9.3 million) but Blacks account for a larger amount of arrests and crime when taking into account the relatively small population.

A third way of considering it is the HIV/AIDS epidemic. Most absolute cases are in Sub-Saharan Africa. However, relative to population, men who have sex with men is a smaller population therefore is more impacted by HIV/AIDS as a population.

Additionally, extremes skew the average. If a person receives more tweets, they’re more likely to be exposed to positive and negative tweets by the nature of the behavioral beast. You will have few on the extreme and lots in the middle.

But we’re going to get into the “per capita” thing after the break.

So let’s bring this back to the data on the tweets. All are estimations of weighted mention volume. Actual tweet amount given in parentheses:

Anita Sarkeesian (35,188): 7,200 tweets, 500 negative, 250 positive.

Brianna Wu (38,952): 8,000 tweets, 1,000 negative, 400 positive

Kotaku (23,500): 5,500 tweets, 500 negative, 250 positive

Leigh Alexander (13,296): 3,300 tweets, 200 negative, 100 positive

Nathan Grayson (732): 250 tweets, 50 negative, 0 positive.

Stephen Totilo (1,708): 500 tweets, 50 negative, 0 positive.

Zoe Quinn (10,700): 2,300 tweets, 200 negative, 50 positive

Some of you may be confused. The sentiment volume is different from the actual tweets. That’s because sentiment volume is not the same as actual tweets. Sentiment volume is a weighting system wherein a profile can have sentiments weighted and then a user matched with them based upon positive sentiments.

The Mention Volume of the second graph is NOT the raw amount of tweets. The maximum mention volume is 9,000 which most mention volume is by Wu followed by Sarkeesian then Kotaku.

This makes the second graph completely worthless in trying to assess if GamerGate is actually about harassment of women. It was never meant to assess this information. It was meant to assess potential matching of brands to a particular person based upon a volume of tweets and comparative analysis to the person’s typical profile of favored brands.

Anyone who says the second graph is the absolute or raw tweet data is absolutely wrong.

Yes, Newsweek used a brand-maximization service to analyze the brand of Sarkessian, Wu, Kotaku, Alexander, Grayson, Totilo, and Quinn.

They did not use a system which can analyze the intention of the tweets. They did not use a system which can examine the motivation for the reason of tweets. They did not use a system which can examine the fundamental mindset of those who are engaging in the tweets.

They used a system which can examine how favorable or unfavorable a brand is. Not a person. A brand.

They then looked at the information presented and concluded that GamerGate is about harassment. BrandWatch does not measure harassment. It does not state it does. This measure should have never been used to conclude harassment.

Especially when the conclusion is made without inferential analysis.


In statistics, research, and testing we have two very important concepts. These two concepts are called reliability and validity. Reliability is the ability of a certain measure to be consistent across examinations.

For example, when you take an IQ test, your score should roughly always be in the same area. That’s because IQ tests have high reliability. They test that which they set out to test across settings, typically functioning based upon cultural expectation of functioning. There are many kinds of reliability including test-retest, parallel-forms, inter-rater, and internal consistency. In short, they all measure how well the test measures what it measures multiple times.

Validity is how well the test measures what it sets out to test. If you’re testing IQ, you want the IQ test to test intelligence. You define intelligence based upon conceptualization and operationalization of measurement. Validity comes in many forms including face validity, criterion-related validity, formative validity, and sampling validity. All measure if the test measures what it should measure.

If a test is not reliable (measuring multiple times), it cannot be valid (measuring what it should measure). However, a test can be valid (measuring what it should) without being reliable (measuring it multiple times).

A test must be reliable (measure something multiple times) to be valid. You can have a good measure without it being valid because the measure may be measuring something else.

The problem here is that the measure (if a brand is popular) does NOT measure if people who tweet to PEOPLE are misogynist or harassers. It merely measures if the brand is popular with people.

Accordingly, the statement that GamerGate is about harassment and sexism from data that are not meant to measure that at all is invalid. As the measurement here cannot be repeated, it also lacks reliability and therefore cannot be used to infer harassment. It can only be used to infer positive or negative mention.

In short, BrandWatch neither measures the stated conclusion nor does it measure it repeatedly. It is therefore neither reliable nor valid as a measure of harassment.


Let’s do some statistics. All of our brands here are different points of data. Along with this, let’s pull up the proportion graph as it is the most meaningful due to not being weighted as the second graph is:

As we found above, the mention volume graph is NOT the actual tweet volume as Claire erroneously believes. It’s mention volume. So, we will have to reverse engineer the actual raw data. That shouldn’t be too hard as we have the number (N) of tweets.

I have plugged the image in Photoshop and found that none of the positive images cross the 5% line. The following are the percent negative opinions of the brands. Note that we are not talking about people because the website did not ASK about opinions on people. It aggregated responses to the brand as defined by Newsweek:

Which means that of the 124,076 tweets aggregated for Newsweek, the amount of tweets should break down roughly like so:

Totals: 3,163 positive, 112,760 neutral, 8,154 negative for an N=124,076. Of note is that at the time, we can assume there were approximately 1.0 to 1.3 million uses of the tag. This means the sample is just 10–9% of tweets at the time and 6% of tweets currently. There’s no known information on how the tweets were selected, if they are representative of the tweet population, or if the tweets were legitimate supporters.

As such, the sample must be considered in the context that it is now approaching less than 5% of tweets. However, the sample is still suitably large for low error rates. Representation of the population is still very much in question due to the lack of methodlogical disclosures.

Of the above tweets, 91% of tweets were neutral to the brand or person. 7% of them were negative. 2% of them were positive.

“BrandWatch found most tweets were neutral in sentiment. And tweets directed at Grayson and Totilo were, on average, more negative than those directed at Quinn, Wu or Sarkeesian.”

From the original article, my finding echoes these in completion. Grayson and Totilo as brands both received more proportionately negative brand tweets than any other persons. Nathan Grayson received 0 positive tweets.

And of those tweets sent, over 90% of them were neutral to the brands.


Now let’s do a statistical inferential test on the numbers provided. We cannot assume normalcy in the curve throwing out all parametric tests. Therefore we must rely on nonparametrics.

We are not dealing with populations and we are not matching as the groups are independent. We have no date measures. This leaves us with Chi-squared, Kolmogorov-Smirnov, and Kruskal-Wallis. We will use the chi-squared test.

As per our pie graph, we would expect the brands to have a spread of 2% positive, 91% neutral, and 7% negative to be part of the brand population.

Newsweek puts forth the conclusion this is due to harassment against women. Harassment for this piece is defined as a behavior of malicious intent that tends to cause injury or harm to others. As such, we can all agree that harassment would easily be called negative. Not neutral. Not favorable. Negative.

Therefore, we will demonstrate harassment against women in these cases if the negative tweets is greater against all women brands versus male brands.

Arguments that “neutral is complicit in negative” has absolutely no logical justification. Harassment is not neutral. It is intentional and it is harmful.

Therefore if we show that GamerGate is harassing against women, we will show that the negative tweets received by female brands is higher than the expected negative tweets.

To do this, we will take the observed frequency of the samples (brands) and compare them to the population of tweets given.

The secondary hypothesis that we will have is that there is a difference between positive, negative, and neutral independent of participant.

To do this, I utilized the chi-squared test and calculated the expected tweets from the observed tweets:


This table shows the amount of tweets we would expect the participants to receive. Here is the table of actual tweets.

So then I went ahead and limited the table to genders of male and female. This means that AS, BW, LA, and ZQ were collapsed into female and ST and NG were collapsed into male. K was removed.

This table shows the amount of actual tweets these persons received. Men received 2,440 tweets. Women received 98,136 tweets. Of tweets received men received 91% neutral tweets and women received 91% neutral tweets.

Men received .69% of their tweets as positive. Women received 2.6% of their tweets as positive. Overall, women received 99.994% of positive tweets.

These are the expected tweets received by either gender:

And finally, here is a table to show you the difference between expected and actual tweets.

Men received more neutral and negative tweets than expected.

Women received more positive and neutral tweets than expected.

Men received far fewer positive tweets than expected.

Women received far fewer negative tweets than expected.

The chi-squared done returned a very significant result (p<.001) concerning gender and negative tweets. The problem is that the differences show us that men who are targeted are much more likely to get more-than-expected tweets that are negative to their brand.

Women are more likely to receive more positive or neutral tweets to their brand.


So where does this leave us? Well, it leaves us with a bunch of posts by people who do not support GamerGate that are completely ignorant of the reality before them.

Yes, women received more tweets, but receiving tweets does not constitute harassment. What constitutes harassment is receiving negative, harmful tweets. The difference here is not clearly as gendered as some would like.

From these numbers, women AND men received a differing number of expected tweets. If harassment or negativity were expressed to women as a sign of misogyny, we would expect to see a relationship where women received fewer positive and more negative mentions.

However, the data show us that BW and LA received more positive; AS, LA received fewer negative than expected. NG and ST both received fewer positive and more negative mentions. This means there is not a clear-cut gender difference between mentions and gender other than the men both received fewer positive and more negative. The women returned a wide range of results.

Anita Sarkeesian received fewer positive and negative tweets than expected while receiving more neutral tweets.

Brianna Wu received more positive and negative tweets than expected while receiving fewer neutral tweets.

Kotaku received fewer positive and neutral tweets while receiving morenegative tweets.

Leigh Alexander received more positive and fewer neutral and negative tweets.

Nathan Grayson received fewer positive tweets and more neutral or negative tweets.

Stephen Totilo received fewer positive and more neutral or negative tweets.

Zoe Quinn received fewer positive and negative tweets but more neutral tweets.

This means that the tweets sent are NOT targeting women exclusively with harassment as there is no gender-negative tweet relationship.

If anything, GamerGate is talking to women who are vocal on the issue with neutral messages or messages that do not hit a algorithm as “negative.” It’s hard to conclude that women are being harassed just because they’re being addressed neutrally.

All we can truly conclude from the given data is that the given brands are perceived VERY neutrally in #GamerGate with some positives and negatives.

We cannot conclude that GamerGate is harassing people simply because they tweet at them. BrandWatch doesn’t measure that. Stating they do is unethical as it generalizes the research further than it goes while simultaneously not disclosing the bias of the researcher and the methodological shortfalls.


In the end, this is the reality of #GamerGate that people like @Zennistrad, @App_of_Self, @chapien, and @gamerfortruth want you to forget.

GamerGate does not hate women.

GamerGate does not hate men.

GamerGate is pretty neutral in how they discuss manners.

91% of identified tweets from Newsweek are neutral to men and women.

And while Zennistrad may think a few tweets from his buddies saying, “This is bad statistics” makes for good statistics, he is wrong.

Because the only difference between how GamerGate talks to men and women is that they engage women who are engaged with them more often.

We’re still supposed to believe that GamerGate hates women because they dare to talk to women like they do men.