72 Hours of #Gamergate

Digging through 316,669 tweets from three days of Twitter’s two-month-old trainwreck

Two months ago today, actor Adam Baldwin was the first to use the #Gamergate hashtag on Twitter, solidifying a name for the movement that’s dominated all conversations in gaming since. Depending on where you sit on the issue, it’s either a widespread campaign of harassment against women or, actually, about ethics in videogames journalism.

Anyone who’s mentioned the #Gamergate hashtag in a critical light knows the feeling: a swarm of seemingly random, largely-anonymous people descending to comment and criticize.

I’ve been using Twitter for eight years, but I’ve never seen behavior quite like this. This swarming behavior is so prevalent, it got a new nickname — “sea lioning,” inspired by David Malki’s Wondermark comic.

I wanted to understand #Gamergate, how its proponents and critics behaved and the composition of both audiences.

So I wrote a little Python script with the Twython wrapper for the Twitter streaming API, and started capturing every single tweet that mentioned the #Gamergate and #NotYourShield hashtags from October 21–23.

Three days later, I was sitting on 316,669 tweets, along with a bunch of metadata for trying to understand the composition of both sides of the #Gamergate movement.

Why three days? It was a manageable and consistent slice of activity, taken at a time when the hashtag wasn’t trending and no major news was breaking, reducing the number of confused newcomers, bots, spammers, and other opportunists.

Hourly posting activity for #Gamergate/#NotYourShield. Times in UTC.

In the process of collecting the data, I posted a couple innocuous charts on Twitter and was predictably flooded with critical comments, many questioning my motives.

Without question, I have a strong anti-Gamergate bias. I co-organize a festival called XOXO that invited two frequent #Gamergate targets to speak, Anita Sarkeesian and Leigh Alexander. I backed Anita’s project, and I think they both do great work. I’m also friends or acquaintances with a few dozen independent game designers, developers, and journalists, most of whom have come out publicly against Gamergate. I think the whole thing’s pretty awful, and that it has critically wounded the public perception of videogames.

That said, I think the numbers below accurately and objectively reflect the data, and the analysis I’m doing is very straightforward. You could reproduce everything with a copy of Excel or Numbers.app. I included a dump of the complete dataset at the end of this post, and I encourage you to double-check my work.

The Breakdown

Most of the posting activity to #Gamergate and #NotYourShield is retweets. (From here on, I’ll refer to both hashtags as just “#Gamergate” for readability’s sake.)

Out of 316,669 total tweets, 217,384 of them (about 69%) were retweets.

The remaining 99,285 (31%) were original tweets— 46,826 weren’t directed to anyone, 39,622 replied directly to another user, and 12,837 publicly mentioned one or more users.

In total, 38,630 user accounts posted to the two hashtags in those three days. Excluding retweets, that number drops down to 17,410 users.

With that out of the way, let’s look at who’s posting to #Gamergate.

Account Age

Gamergate is unusual in one respect: many of its proponents are using newly-created, often pseudonymous, accounts.

The chart below shows every tweet charted by the month that user signed up for Twitter.

#Gamergate tweets charted by month of account creation
Roughly 25% of all Gamergate activity is coming from accounts created in the last two months.

To be clear, I’m not suggesting these accounts are bots or sockpuppets — one person controlling multiple accounts — but simply that these accounts are new to Twitter.

As Gamergate supporters were quick to point out, many of them joined Twitter simply because that’s where the debate was. Some created anonymous accounts to avoid being tracked and identified, while others joined only after being turned away from other forums.

Is this distribution unusual, though? For contrast, I tried another hashtag for a similar length of time, the #kashmirfloods hashtag used during last month’s tragic floods that ravaged northern India. The distribution is much closer to what you’d expect: evenly distributed, roughly following Twitter’s rise in popularity.

The Retweet Network

As you’d expect, there are two large communities contributing to the #Gamergate hashtag, and who they choose to follow and retweet are very, very different with little overlap.

There’s little overlap between communities.

For example, in this three day period, 1,673 users retweeted Anita Sarkeesian, while 2,240 users retweeted Blocker (aka Mr. Fart), one of the most prolific Gamergate tweeters. (Yes, the most retweeted person in #Gamergate is named “Mr. Fart.”) But only 79 users retweeted messages from both accounts.

Contrast that with the 1,138 users that retweeted messages from both Blocker and Gamergate proponent Milo Yiannopoulos in the same time period.

The top RTed users are pro-GG, the top RTed tweets are against.

The list of most retweeted users is dominated by Gamergate proponents, with only a couple critics in the top 20. Former NFL player and gamer Chris Kluwe pops up in #2 after a string of popular anti-Gamergate rants, but even Anita Sarkeesian only appears in 15th place.

The most retweeted tweets, however, look very different. The top 10 is entirely Gamergate critics and satire, with only five pro-Gamergate tweets in the top 20.

Why would that be? One obvious reason is the sheer number of #gamergate-tagged tweets being posted by supporters, while critics tend to post far fewer, possibly to avoid getting sea lioned.

Gamergate supporters use the #gamergate hashtag more often.

For example, the top five most-retweeted Gamergate critics collectively had 87 of their #gamergate-tagged tweets retweeted within the three day period. The top five Gamergate proponents had 811 tweets, nearly ten times as many.

Averaging Gamergate

We can use retweet behavior as a rough proxy to group like-minded individuals together. As we’ve established, those who retweet Anita Sarkeesian, Brianna Wu, and Zoe Quinn tend to fall in the opposing camp of those who retweet Milo Yiannopoulos, Internet Aristocrat, or Christina H. Sommers.

I grouped together the 3,022 accounts who retweeted Milo Yiannopoulos, Internet Aristocrat, or Christina H. Sommers, and the 1,694 who retweeted Anita Sarkeesian, Brianna Wu, and Zoe Quinn. With that, we can draw some rough demographics for Twitter usage.

The median Gamergate supporter has 67 followers, follows 134 accounts, has posted 1,194 tweets, and joined Twitter a little over two years ago.

The median Gamergate critic has 144 followers, follows 234 accounts, has posted 3,765 tweets, and joined Twitter four years and three months ago.

Naturally, this is skewed by the large population of relatively newly-created Gamergate accounts.

Gauging Sentiment

On Saturday, Newsweek partnered with a social media monitoring firm called Brandwatch to publish their own analysis of the Gamergate hashtag using half a million tweets sampled from September 1.

They ran sentiment analysis on tweets directed to several prominent Gamergate critics, and found across the board that around 90% of the tweets were “neutral.”

Newsweek interpreted this to mean the tweets were neither positive or negative, but I’m fairly sure Brandwatch simply meant they couldn’t make an automated determination for 90% of tweets — sentiment analysis using less than 140 characters can be challenging.

Newsweek’s sentiment analysis. Nope.

Digging into the actual text by hand, it’s clear that these tweets are anything but neutral.

In my three-day sample, there were 1,171 tweets that mentioned Anita Sarkeesian’s Twitter username, 485 for Brianna Wu, and 338 for Zoe Quinn. I put the text of all of those tweets, without user information, in this spreadsheet so you can see for yourself.

Roughly 90–95% take a clear side either in favor or against Gamergate.

A quick manual classification of a sample shows the numbers to be closer to 75% negative, 15% positive, and 10% neutral or undetermined, very far from Newsweek’s automated attempt. I’ve reached out to them to see if they’ll publish a clarification about renaming “neutral” to “undetermined.”

Update: Mike Williams, a data scientist at Brandwatch, confirmed that “neutral” should be “undetermined.” This morning, October 29, Newsweek published a clarification, but left the charts as they were, despite missing sentiment data for 90% of their tweets.

Worlds Apart

With the help of Gilad Lotan, chief data scientist at Betaworks, we grabbed the social graph for everyone in the dataset and visualized it using a fantastic open-source package called Gephi.

We used that information to map the universe of people who contributed to #Gamergate, clustering them into groups based on their relationships.

While there are hundreds of small communities represented by this visualization, it’s clear they group into two major groups: on the left, pro-Gamergate. On the right, anti-Gamergate. In the middle, a handful of controversial people engaging both sides. And on the margins, a constellation of isolated people unrelated and disengaged.

Each point is a single person in the #Gamergate universe, the lines connect who they follow. See a larger version with labels.

This network visualization is as good a metaphor as any for #Gamergate. Two massive, impenetrable hairballs of people that want little to do with one another, only listening to their side and firing volleys across the chasm.

Is it over yet?

The Data

Update: Originally, I was hosting complete downloads of the data here for anyone to play with, make their own visualizations, or simply fact-check my work.

Unfortunately, as it turns out, distributing the contents and metadata surrounding tweets is a violation of section 6b of Twitter’s Developer Policy. Twitter politely asked me to remove the downloads without sending lawyers, and I very much appreciate that approach.

My guess? This policy exists to protect the privacy of their users. Any downloadable dataset could include information that was subsequently deleted or made private by its owners, or removed by Twitter.

Pursuant to their guidelines, I’ve replaced the original dataset with a much more limited one, containing only the tweet ID and user ID. You can download it as a 9MB CSV or a 3MB ZIP.

I know this is far from ideal, but you can use this information to reconstruct the original dataset by using Twitter’s statuses/lookup API method, 100 tweets at a time. With their API rate limits, you should be able to grab up to 10,800 tweets an hour. Reconstructing the entire dataset would take around 29 hours.

Sorry, everyone.