Quest for an Offensiveness Detector Part 1

It seems like so much of our online activities are tied to our identity. All manner of online shopping and social media require you to hand over pieces of your identity so that you can enjoy their product or service. At the crux of this particular quest is this thought:

What kinds of conversations are possible with social media that is completely anonymous?

I recently started doing some data science projects at Confesh, an anonymous social media platform that makes a promise never to track you… no username, email, or ip address. One of the interesting things we’re exploring is classifying user sentiment, ie. what do people think/feel about a confession? Is most of it spam, trolling, and bigotry, or — maybe counter intuitively — can there be honest, substantive, or at least some kind of civil conversation?

The thing about sentiment analysis is that a sentiment classifier (i.e. “this post has a positive/negative sentiment”) only performs well if have access to a lot of labeled data. Luckily, Confesh also has a mechanism for reporting spam. These reports are a potential source of labels because users can provide free text to state the reason for reporting the confession or comment.

One other limitation of sentiment analysis is that it can typically only detect patterns in simple binary outcomes, like “this review is positive or negative”. I can talk about this more in a future post, but generally going for the simplest model is the most expedient thing to do when building these kinds of data pipelines. Luckily, the subset of the Confesh dataset that we’re going to take a look at might be able to provide us with the everything we need to create a rudimentary ‘offensiveness’ detector.

A word cloud of Mount Holyoke confessions.

In data science speak, I’d say we’re dealing with semi-structured data (which we often are). In this post, I report a some of my findings through visualizations that hopefully provide a few descriptive insights into the Mount Holyoke Confesh posts.

I always like to have a working hypothesis to guide my explorations, so here it goes:

There are statistical patterns in the word composition of confessions such that we can predict whether a confession is offensive or not offensive with some degree of accuracy using a simple classifier algorithm.

I won’t really be able to test this hypothesis in this post, but I think it’s a good enough motivation to get us started. Below you can find the interactive graph that I created using Plotly, a really nice open source library for creating custom interactive visualizations. I tried to answer the following question:

If we group confessions by those that were reported by the community and those that were not, how would the above word frequency distribution change?

If you want to take a look at the code that I used to run my analysis, check out my technical blog post!


  • The n-word is being used a lot in this community forum
  • Preliminary analysis suggests that the use of the n-word is mostly as spam.
  • The Mount Holyoke community is moderating the hell out of posts that contain the n-word.

More Questions

As always, exploring data only leads to more questions. The next step on this quest is to see why the community is reporting a particular post. With these text data, we can start to label our confessions with something like “offensive” / “not offensive”, which will be the engine for our offensiveness classifier.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.