Filtering Twitter: Reducing Toxic, Depressive, Profane, and Sexually Explicit Tweets through Adjustable Sliders

6 min readDec 14, 2021

TL;DR: We built a Chrome extension for users to filter their Twitter feed on the basis of depressive, toxic, profane, and sexually explicit language. The functionality attempts to address problems endemic to social media today. For developers, check out our repo: https://github.com/andy-techen/better-social-media

Introduction

For the last half a decade, many Americans have come to realize there’s a problem with social media applications. Social media sites have amplified the polarization of Americans, become embroiled in free-speech versus censorship debates, and been used by authoritarian governments to spread propaganda in countries such as Myanmar.

Social media applications are toxic. The recent release of the Facebook Papers and Congressional testimony of Frances Haugen cemented this truth when it was revealed that internal leaders knew Facebook and Instagram was harmful for young girls and acted on this intelligence, rather than intervene. If there was still some ambivalence about the need to regulate social media sites before, the Facebook papers alarmed many to the problems plaguing these sites today.

Along with two classmates, Te-Hsuan Chen and Saurabh Budholiya, I wondered if there was a project idea within the scope of our Information Retrieval course at the University of Michigan’s School of Information that might be able to address these problems while working within the materials of our coursework. The project needed to work with an established social media platform and make use of some concepts we’ve been learning about and working with within class: indexing, retrieval models, text categorization, learning to rank, retrieval system design and implementation, and more. We were interested in building something that could be used by real users, and appreciated for its efforts to tackle a problem that many social media sites haven’t been able to solve.

We settled on a simple idea for users to filter out “negative content” on their feed. The end product is a full-stack product packaged into a Chrome extension. The filters include Toxicity, Profanity, Depressiveness, and Sexually Explicit content. This blog post details how we built our plugin and reports our findings, both within our features and on the nature of a tool we tried to build.

Data and Methodology

As an overview, our project consists of robust machine learning models that were trained to detect sentiments within tweets which were packaged into API’s and deployed to Heroku where our plugin calls in order to score tweets dynamically as a user scrolls.

The first step was then collecting data. At first, we tried scraping our own Twitter data for all four sentiments. As we began to model though, we encountered some problems that would become the most challenging part of this project: building strong detection models. Machines can vectorize text and attempt to observe patterns, but they are not very good at understanding hidden semantic meaning. For example, a tweet such as “I just came out of depression and am feeling better!” could be construed as generally optimistic, but because the text mentions the term depression, our model might intuit from previous tweets that this too would be classified as depressive. We were forced to pivot for the depression model, scraping 10,000 new tweets and annotating all of them to try and enforce the kind of nuance we were hoping the model would capture. After annotating we used the following code to preprocess each tweet:

The processing consists of punctuation removal, special character handling, username removal, tokenizing, and lemmatization. After evaluating with four different classification models, XgBoost proved the most effective with an accuracy of 86%.

Toxicity, Profanity, and Sexual Explicitness were much more difficult. We obtained data from our professor after repeatedly failing to get our models to pass basic tests (e.g. testing an array of swear words and scoring low for obviously inappropriate terms). We built these models on about 80,000 rows each and used the GPU from University of Michigan’s Great Lakes Slurm cluster to train these models quickly. The best accuracy we could obtain for each was 52% for Toxicity and Profanity, and 54% for sexually explicitness. Given the difficulty and number of trials on this stage of the project, we decided to deploy these models despite lower accuracy than desired.

The next step was packaging these models into an API (along with the vectorizers). Using Flask, we built a RESTful API that can be called in our Chrome extension. Then, the API was deployed on Heroku. This process allows our extension to retrieve predictions anytime we want with a simple POST request.

Here’s the result in Postman:

Developing the plugin was the final step. Our plugin extracts each tweet on the user’s Twitter feed, applies the same preprocessing formula used in modelling, and then calls the API to score the text. The results look something like this in Chrome’s developer console:

After fetching predictions for each tweet using our Flask API, we then compare it with the user’s stored settings to determine which tweets should be filtered out. If the tweet has a predicted depressive, toxic, sexually explicit, or profane score that exceeds either one of the respective thresholds, we then white out the tweets.

Results

The results demonstrate some success! To test the performance of our plugin, we applied a NDCG (normalized discounted cumulative gain) score on the first five tweet results for the query term “pornography.” The results are below:

As you can see, under the “Tweet Results Before Filter,” the top five tweets in the order they were returned are displayed and assigned relevance scores. Our relevance scores were based on the degree of Sexual Explicitness observed in the text and adjusted for considerations of key terms, semantic meaning, and user intent. For example, the second tweet, “What the hell????????” was ranked a five because there was no explicit link or even implicit connection to pornography in the tweet’s text (although, this tweet was accompanied by a photo which our filter cannot recognize or respond to; a limitation of this evaluation metric’s ability to measure performance). Conversely, the tweet “I’m so sexy pornography rly did choose me” was ranked a one as the user made direction mention of the term “pornography” without adding any value in a news or informative/reporting sense. The right half of the table displays the top five tweets after applying the filter. For the tweets that remained, we kept the relevancy scores assigned before the filter. The new tweets were assigned relevance scores that were higher based on the quality of their content. As a result, the NCDG score for the top five tweets after the filter is higher than the tweets before filtered (0.89 > 0.83).

Next Steps

The plugin still needs some work. For one, performance is inconsistent and often fails to catch up with the speed of a user’s scrolling. The other primary challenge is the complexity of our language detection models. Our failures to capture the nuance of certain query results seem to apply to each of the models we built. These models are extremely difficult to build, especially for students with limited experience in NLP methods.

For those who are interested in building similar Chrome extensions (building robust text analysis tools and designing effective information retrieval systems), we propose two takeaways:

Focus on language detection models: collect diverse data, account for unequal representations of certain texts or themes, and move beyond term-based filtering (TFIDF Vectorizers).
Think critically about Twitter’s architecture, and how to design extensions that work seamlessly with user habits and system design (fast scrolling, skimming, etc.)

Filtering Twitter: Reducing Toxic, Depressive, Profane, and Sexually Explicit Tweets through Adjustable Sliders

Introduction

Data and Methodology

Results

Next Steps

Written by Cameron Milne