Photo by Da Nina on Unsplash

Moderating text with the Natural Language API

Laurent Picard
Google Cloud - Community
2 min readJun 16, 2023

--

2023–09–12: text moderation got Generally Available (GA) over the summer + added link to Sep. blog post

The Natural Language API lets you extract information from unstructured text using Google machine learning and provides a solution to the following problems:

  • Sentiment analysis
  • Entity analysis
  • Entity sentiment analysis
  • Syntax analysis
  • Content classification
  • Text moderation

🔍 Moderation categories

Text moderation lets you detect sensitive or harmful content. The first moderation category that comes to mind is “toxicity”, but there can be many more topics of interest. A PaLM 2-based model powers the predictions and scores 16 categories:

| ---------- | --------------------- | ----------------- | -------------- |
| Toxic | Insult | Public Safety | War & Conflict |
| Derogatory | Profanity | Health | Finance |
| Violent | Death, Harm & Tragedy | Religion & Belief | Politics |
| Sexual | Firearms & Weapons | Illicit Drugs | Legal |

⚡ Moderating text

Like always, you can call the API through the REST/RPC interfaces or with idiomatic client libraries.

Here is an example using the Python client library (google-cloud-language) and the moderate_text method:

from google.cloud import language

def moderate_text(text: str) -> language.ModerateTextResponse:
client = language.LanguageServiceClient()
document = language.Document(
content=text,
type_=language.Document.Type.PLAIN_TEXT,
)
return client.moderate_text(document=document)

text = (
"I have to read Ulysses by James Joyce.\n"
"I'm a little over halfway through and I hate it.\n"
"What a pile of garbage!"
)
response = moderate_text(text)

🚀 It’s fast! The model latency is very low, allowing real-time analyses.

The response contains confidence scores for each moderation category. Let’s sort them out:

import pandas as pd

def confidence(category: language.ClassificationCategory) -> float:
return category.confidence

columns = ["category", "confidence"]
categories = sorted(
response.moderation_categories,
key=confidence,
reverse=True,
)
data = ((category.name, category.confidence) for category in categories)
df = pd.DataFrame(columns=columns, data=data)

print(f"Text analyzed:\n{text}\n")
print(f"Moderation categories:\n{df}")

You may typically ignore scores below 50% and calibrate your solution by defining upper limits (or buckets) for the confidence scores. In this example, depending on your thresholds, you may flag the text as disrespectful (toxic) and insulting:

Text analyzed:
I have to read Ulysses by James Joyce.
I'm a little over halfway through and I hate it.
What a pile of garbage!

Moderation categories:
category confidence
0 Toxic 0.680873
1 Insult 0.609475
2 Profanity 0.482516
3 Violent 0.333333
4 Politics 0.237705
5 Death, Harm & Tragedy 0.189759
6 Finance 0.176955
7 Religion & Belief 0.151079
8 Legal 0.100946
9 Health 0.096305
10 Illicit Drugs 0.083333
11 Firearms & Weapons 0.076923
12 Derogatory 0.073953
13 War & Conflict 0.052632
14 Public Safety 0.051813
15 Sexual 0.028222

🖖 More

--

--

Laurent Picard
Google Cloud - Community

Tech lover, passionate about software, hardware, science and anything shaping the future • ⛅ explorer at Google • Opinions my own