Text classification using TensorFlow.js: An example of detecting offensive language in browser

Posted by Jeffrey Sorensen and Ann Yuan

Why detect toxicity?

Online communication platforms are increasingly overwhelmed by rude or offensive comments, which can make people give up on discussion altogether. In response to this issue, the Jigsaw team and Google’s Counter Abuse technology team collaborated with sites that have shared their comments and moderation decisions to create the Perspective API. Perspective helps online communities host better conversations by, for example, enabling moderators to more quickly identify which comments might violate their community guidelines.

Several publishers have also worked on systems that provide feedback to users before they publish comments (e.g. the Coral Talk Plugin). Showing a pre-submit message that a comment might violate community guidelines, or that it will be held for review before publication, has already proven to be an effective tool to encourage users to think more carefully about how their comments might be received.

An example toxicity classifier

As part of our focus to bring language-based models to TensorFlow.js, we are releasing the Toxicity classifier as an open-source example of using a pre-trained model that detects whether text contains toxic content such as insults, threats, sexually explicit language and more.

This model is intended as an example you can build upon, and is not meant for use out-of-the-box in production environments without careful additional training and tuning for your domain. We’ve shared it for educational purposes, and to highlight how it’s possible to build entirely client-side text analysis models, that run in close to real-time in the browser.

This release is a collaboration between the Perspective API team within Jigsaw, and the TensorFlow.js team within Google Brain. The model is ~25MB and achieves inference times of ~30ms per sentence on a 2018 MacBook Pro.

Note that these models are highly experimental and the tensorflow.js versions have lower accuracy, and greater unintended ML bias than the versions hosted on Perspective API (you can see our model card for the tensorflow.js toxicity model and can find more information about known issues with this dataset in the unintended ML bias tutorial).

Check out our live demo here.

Getting started with the toxicity classifier

  1. Installation

You can start using the toxicity classifier in your projects today. The first step is to install the model. The model is available through NPM:

The toxicity classifier requires TensorFlow.js version 1.0 as a peer dependency, which means you need to install it independently. If you don’t already have it installed, you can run:

Now you can import the toxicity model like this:

You can also include the bundled model via a script tag on your webpage:

This will expose toxicity as a global variable.

That’s it — you’ve installed the model!

  1. Usage

To start labelling text, first load the model (fetch its weights and topology) and optionally adjust the model settings for your needs.

The threshold parameter is the minimum confidence level above which a prediction is considered valid (defaults to 0.85). Setting threshold to a higher value means that predictions are more likely to return null because they fall beneath the threshold.

The labelsToInclude parameter is an array of strings that indicates which of the seven toxicity labels you’d like predictions for. The labels are: toxicity | severe_toxicity | identity_attack | insult | threat | sexual_explicit | obscene. By default the model will return predictions for all seven labels.

In the snippet above we used the model.classify() API to predict the toxicity of ‘you suck’. model.classify() returns a Promise, so we need to call .then() afterwards to retrieve the actual predictions.

The predictions are an array of objects, one for each toxicity label, that contain the raw probabilities for each input sentence along with the final prediction. The final prediction can be one of three values:

  • true if the probability of a match exceeds the confidence threshold,
  • false if the probability of *not* a match exceeds the confidence threshold, and
  • null if neither probability exceeds the threshold.

Here’s what the predictions array might look like for our sample text:

As you can see, each results array contains only one item — that’s because our sample input is an array of just one sentence: [‘you suck’]. However we can also batch inputs to model.classify() for faster per-sentence inference time. In practice we’ve found that a batch size of four works well, although the optimal number will depend on the length of each individual sentence.

Why do this in the browser?

We see the release of this example as a first step in advancing the development of client-side versions of these models. Client-side models open up many more use cases such as in forums where content isn’t being posted publicly, or when users would prefer that snippets be scored on-device.

For example, doing this type of evaluation client-side eliminates potential privacy concerns related to sending not-yet-published comments over the internet. It also removes the need for an additional web service endpoint, and allows full customization of the machine learning model to the moderation policies of any particular website.

Model details

The TensorFlow.js toxicity classifier is built on top of the Universal Sentence Encoder lite (Cer et al., 2018) (USE), which is a model that encodes text into 512-dimensional embedding (or, in other words, a list of 512 floating point numbers). These embeddings can be used as starting points for language processing tasks such as sentiment classification and textual similarity analysis. The USE uses the Transformer (Vaswani et al, 2017) architecture, which is a state-of-the-art method for modelling language using deep neural networks. The USE can be found on the TensorFlow Hub, and is also available as a separate TensorFlow.js module.

We’ve also open-sourced the training code for the model, which was trained on a dataset from civil comments. We encourage you to reproduce our results, and to improve on our model and grow the publicly available datasets.

The code used to build the classifier exists in another github project based on Tensorflow Estimators, specifically the tf_hub_tfjs sub-directory. There you will also find information about the performance characteristics of this model compared to both the one featured in the Perspective API and those created as part of the Kaggle Toxic Comments Challenge. It’s a simple extension of the USE model with just a couple of additional layers before the output heads for each tag.

If you have the TensorFlow.js converter installed, the trained model can be converted for use in the browser with the following command (this will return just the “toxicity” head):

Further applications

We are interested to see what people will build with client-side machine learning text models. We’ve started with toxicity tagging, but there are many natural language datasets available with which to build new applications. Content-based clustering, text highlighting, and emoji recommendation are just a few potential ideas worth exploring.