Twitter toxicity detector using Tensorflow.js

Faucet Consumer
Analytics Vidhya
Published in
5 min readSep 11, 2020

In recent years, the interaction in Twitter has become increasingly toxic with the rise of hate speech, harassment and offensive content. In this article, I want to explain how to build a simple toxic tweet detector, using twitter API and tensorflow.js.

Introduction

In this post I want to show step by step how to build a web application that shows the last toxic tweets and retweets from the given user.

The application will receive 2 parameters: tweets to search and username.

  • Tweets to search: the number of tweets that we want to bring from the twitter API to be analyzed (10, 20, 50 or 100).
  • Username: the twitter user (@…) from which we will analyze the tweets.

The more tweets are analyzed, the more expensive the processing to predict toxicity becomes, so I added a selector to fetch 10, 20, 50 or 100 tweets (where 50 to 100 tweets imply a very high processing cost, reason why I don’t recommend to try it on low-performance computers or phone browsers).

As a result, it shows a list of the toxic tweets and retweets detected. You can access the live demo from here.

Note: Sometimes the Twitter API doesn’t get all the tweets on the first try, so you would probably need to press “SEARCH TWEETS” again. 🤷‍♂️

Demo screenshot

Twitter API

The Twitter API allows us to programmatically access twitter content, it can be used to read, analyze and interact with tweets, direct messages, users, and other key Twitter resources.

More info

Tensorflow.js

TensorFlow.js is a JavaScript library for training and implementing machine learning models in browsers and Node.js.

More info

Toxicity classifier Model

The toxicity model is a tensorflow machine learning model that detects if the text contains toxic content such as insults, obscenities, hatred or explicitly sexual language. It’s built on top of the Universal Sentence Encoder and was trained on the civil comments dataset which contains ~2 million comments labeled for toxicity.

More info

Setup

Twitter API

The first thing to do is create a twitter developer account to access the API, it’s done in simple steps by following this guide.

Once we are registered we can access the portal to see the usage statistics and have access to the endpoint urls and access keys to the API.

Tensorflow.js and Toxicity classifier

To be able to use Tensorflow from the browser we simply create a classic web app using HTML, CSS and JavaScript (if you dont know how to do it, follow this guide), and then we import the models and tools from tensorflow using yarn, npm or script tags.

In this github it’s very well explained how to do it.

Express.js environment

Unfortunately, Twitter doesn’t allow applications to access the API directly from the browser, so it’s necessary to build a server application to be able to access.

In my case, I did it using Express.js and then I deployed it to heroku. With a free account you can create and deploy up to 5 server applications.

If you don’t know how to do so, or if you just don’t want to create a server application, you can clone my project, modify the parameters to access your twitter API and then deploy to your heroku account.

In case you don’t know anything about servers or you just don’t want to build it yourself, you can directly access my API from your web application using the API url of the server that I built (I enabled the CORS to keep it accesible for anyone).

Url format:

https://toxicity-classifier-server.herokuapp.com/twits/<username>/<number_of_tweets>

Final architecture:

Once we have followed all these steps, we will have this architecture built and we can use it as we want to. In this case, we just will use it to get the recent tweets for a given user and analyze it from the browser.

Architecture flow diagram (Diagraming this took me longer than programming the system)

Steps

In your web project take the following steps.

1- Link a js file to the html template.

2- Import the previously installed tensorflow toxicity library.

3- Load the model when the application starts (using only the toxicity label and a given threshold).

4- When the “search tweets” event is triggered, access the API to fetch the most recent tweets.

5- Once the tweets are fetched analyze them and predict toxicity.

6- Show tweets that have a true or null value as toxicity (null means that the program is not sure if it is or not toxic, the result doesn’t exceed the threshold).

Example results

Conclusion

The amount of data produced in the world grows exponentially every year, and there are many open and free tools with which we can exploit this data in interesting ways, for example using machine learning models as shown in this article.

Other interesting features that can be done with this mini system are:

  • Use other features of the Twitter API.
  • Use other Tensorflow models to detect patterns in the text.
  • Extending the application to show more information.
  • Use this model to do data analysis (e.g. [toxic_tweets / total_tweets] over time).

I encourage you to try this by creating your own versions to exploit this data in other ways.

Github repo: https://github.com/MCarlomagno/toxicity-classifier

Demo app: https://mcarlomagno.github.io/toxicity-classifier/

--

--

Faucet Consumer
Analytics Vidhya

I just enjoy building non-tangible things, and writing about it