Measuring sports teams subreddits’ toxicity during losses (or: Sixers fans are not okay)

Tomer Kremerman
4 min readNov 22, 2017

Google’s Perspective algorithm assigns a toxicity score between 0–1 to short texts. It took some justified fire for assigning high toxicity scores to some particularly triggering false positives, but is still an interesting tool to gauge virulent language. Looking to try it out, I thought to run some experiments in an arena known for raw emotional responses — sports, or in particular sport fans after tough losses. And no franchise offers a more potent combination of savage, rancorous fans and prolonged misery than the Philadelphia 76ers.

Just this past Saturday, the newly rejuvenated Sixers gave up a 22-point half-time lead to lose to the Warriors 124–116. Their subreddit’s game thread had 1,439 comments, which seemed enough for a robust analysis. Now, Perspective's toxicity scores are noisy in a high-adrenaline context, since “fuck yes!” is just as toxic as “fuck no!” (0.94). Still, I expected the level of toxicity to be lower during the first half, gradually go up as GSW got back in the game, and spike when the lead was overturned or after the final whistle.

Indeed, preliminary analysis (after scraping the thread and running all comments through Perspective's API, regretfully deleting comments such as “Zaza pacheapshot” which the algorithm mistakenly identified as written in Finnish) seemed to match the hypothesis, with a substantial spike towards the end of the game:

Number of toxic (>0.5) comments per 2-min span, Sixers subreddit PHI-GSW

However, looking at raw numbers can be misleading, since there might just be more comments per minute towards the end of the game — both toxic and innocuous. Measuring the percent of toxic comments per time slot reveals that the toxicity of Sixers fans isn’t affected by trivial things such as losing, but is rather a constant cosmic fact:

Percent of toxic (>0.5) comments per 2-min span, Sixers subreddit PHI-GSW

Seeking a more genteel fanbase (and more robust data), I turned to less recent but far more heart-wrenching loss: the Falcons’ Super Bowl defeat at the hands of the Patriots. Their subreddit’s game thread, a 4-hour study in tragedy, had 5,962 comments ranging from the cheerful, innocent “THAT’S MY TEAM” after the first touchdown (toxicity<0.01) to the slightly less jolly “FUCK YOU BRADY YOU PERVERT LOOKING BITCH!!!” (toxicity>0.99).

Plotting out all comments’ toxicity reveals some preliminary patterns:

Toxicity of all comments over time, Falcons subreddit Super Bowl thread

There seem to be two spikes of toxicity; one at around 19:45pm, before half-time, and the other — as expected — towards the end of the game. Plotting average toxicity per 5-minute time span allows a clearer understanding:

Percent of toxic (>0.5) comments per 5-min span, Falcons subreddit Super Bowl thread

The first toxicity spike, at ~7:41pm, is fans responding to three defensive holding penalties against Atlanta’s defense (“STOP FUCKING HOLDING AGHGHGHGHGH”, toxicity>0.95). The second spike is Pats’ comeback to tie the game (“oh my fucking god”; “This is the most Atlanta thing to happen in the fucking super bowl of all games”). The temporary lull, represented by two blue dots, is overtime; and the final spike of toxicity at 10:20pm, honestly too wretched to quote from, is reaction to the loss.

Finally, aiming for robustness, I looked to the Spurs playoffs loss to Golden State (them again. That franchise caused many heartbreaks in recent history), after a controversial foul on Kawhi Leonard by GSW’s Zaza Pachulia. the r/nba game thread had 29,000 comments; though the forum is inhabited by fans of all teams, I assumed most users would object to the foul. And indeed, the spike in toxicity after the injury is hard to miss:

Percent of toxic (>0.5) comments per 5-min span, r/nba SAS-GSW

Sorting the comments by toxicity definitely reveals a pattern:

Coming up soon: ranking the most virulent NBA fanbases, as if we need data to know it’s the Knicks.

--

--

Tomer Kremerman

Public policy @ Princeton; data, basketball, self-deprecation