Lies, Damn lies, and #GamerGate statistics

Mah Stick
4 min readOct 25, 2014

A Newsweek article just came out, claiming that GamerGate is not about ethics but rather about harassment, coupled with some graphs and numbers to show for.

Before we refute that a quick reminder:

  1. Most journalists are looking for sensations
  2. Most journalists don’t really understand statistics

The thing marketers love the most about big data is the manipulations you can pull on it, should you choose to tell either a negative or a positive story. Enthusiastic gamers following NPD videogame sales reports already know how it works: “The Big 3” (Sony, Nintendo and Microsoft) always put some kind of PR spin on a subset of the data to demonstrate why they had a good month, sometimes becoming very creative with the numbers.

The bad thing about pulling these kinds of stunts is how experts can quickly point that you’re manipulating the data and talking out of your ass.

Now, I won’t go into the many failures in the NewsWeek report, as Cainjw already did a fine work there. I will however note two things you should always look for when looking at graphs and charts in the media:

Are they telling the whole truth? Or just throwing some random facts around?

When doing an actual research, you formalize a hypothesis and then choose what test you will carry to accept or reject said hypothesis. In the following case Newsweek made the hypothesis “is it about ethics or harassment?” and to test that they counted the number of times some male and female GamerGate “celebs” were mentioned in tweets containing the hashtag, assuming that more mentions of females equates harassment.

But how is that even proper test for the hypothesis? The fact that one person is mentioned more than another doesn’t show anything one way or another - except for perhaps reinforcing their celeb status in the debate and their own active involvement in it. A proper hypothesis would be “how many of the #gamergate tweets even mention any one of these people in the first place?”. If it’s about ethics and not about harassing people we expect that most of the tweets will not include any one of these people. Luckily, we already have that piece of information:

This paints a pretty clear picture. When looking at the full dataset, mentions of Zoe and Anita are negligible at best.

Cause and effect

One of the most common “statistics comprehension” fails happens when people can’t tell the difference between correlation and causation. Even if Newsweek’s methodology was firm and a link was found between A and B, this still doesn’t tell us if A causes B or vice versa (or if there’s a hidden parameter C responsible for both).

Newsweek are like a reporter digging up data showing a correlation between people’s weight and their dietary habits. They find out from the data that overweight people have attempted more diets throughout their lives than the average person. They then come out with the attention grabbing headline and conclusion:

We now have a proof: diet makes you fat!

Sounds ridiculous? Indeed. And the GamerGate headline Newsweek came out with is just as ridiculous. They are wrongfully assuming that A causes B just because there is some kind of correlation between A and B.

Just as a thought experiment, here are few possible reasons for why there are more tweets about Zoe than Nathan:

  1. Zoe Quinn is more active than Nathan Grayson in Twitter, or directly engages the #Gamergate hashtag more than Nathan Grayson.
  2. Zoe Quinn has more followers than Nathan Grayson in Twitter, jumping into the conversation and retweeting here whenever #Gamergate is mentioned.
  3. Nathan Grayson’s ethical issues with Zoe were investigated, challenged and mostly put behind at some point — before #GamerGate hashtag even came to be. But Zoe was involved many other issues: DMCA takedowns, disruption of the Fine Young Capitalists campaign, Patreon contributions from reporters and so on.
  4. Zoe received bigger media coverage than Nathan Grayson leading to more tweets about her.
  5. The negative tweets towards Zoe are justified criticism.
  6. The negative tweets towards Zoe are not justified and so this hashtag is about harassing here.

If Newsweek wishes to prove the claim at point 6 (which is at the base of their hypothesis) is correct, they have to explain first why they are ruling out points 1 to 5 and back it with evidence.

In the meantime, the full dataset gives out a very interesting fact. Perhaps they should consider this piece of information for their headline:

Half a million #GamerGate tweets Analyzed. Only 0.2% negative tweets towards prominent female opposers.

--

--