Grooming Detection Part. 3: A BERT Multiclass Model Experiment

Published in

Besedo Engineering Blog

6 min readJan 12, 2023

Imagery of a little girl holding hands with a robot — Photo by Andy Kelly on Unsplash

In our last blog posts, we shared a Rule-base Matching solution to help you detect grooming in texts. This blog post is an experiment of grooming detection using a pre-trained deep learning model: BERT.

What is BERT?

BERT is an encoder model with an attention mechanism that permits generalization over data regarding language. Its model architecture works in two times:

(i) a pre-trained stage where the model learns language distribution on a large amount of data. This learning is based on two unsupervised tasks:

a Masked Language Model (MLM) task where some tokens are randomly masked into sentences and predicted by the model. For instance: “Artificial Intelligence aims to [MASK] human behaviour.”
a Next Sentence Prediction (NSP) task where the model predicts if two sentences follow each other using a binary response (yes/no). For instance, given the sentence “Artificial Intelligence aims to simulate human behaviour.”, the machine has to predict if the following sentence is the next: “Example tasks in which this is done include speech recognition, computer vision, natural language processing.”

(ii) a fine-tuning stage where the pre-trained model classifies texts on a specific supervised task. An example of grooming detection will be to predict if a text contains grooming based on X labels.

To understand BERT processing and Transformers models in-depth, we suggest you check this excellent blog post.

Tons of versions of BERT have already been developed in the community. The classic BERT Base and BERT Large models were trained on unpublished books and the English Wikipedia. Still, we also have models trained in other languages like CamemBERT or BERT Multilingual and specific models like MobileBERT, DistilBERT, and others. Beyond BERT, here is a blog post that can help you understand the differences between some well-known Transformers models (RoBERTa, DistilBERT, XLNet).

Grooming Detection Model with BERT

If our main solution is rule-based right now to detect grooming, we wanted to test the performances of a pre-trained Deep Learning model on this task. We tested the easiest solution: a prediction text by text with only content body and labels to predict.

Data Description

We will use open-access data from PAN12 and PJZC (only the one-on-one conversations between groomers and non-groomers) plus additional client data. As a reminder, our client data contains social media profiles, dating website profiles, and a public in-game chat. The following table gathers information about each dataset we have.

total client: 90 617, total open-access:151 605, total client+open-access: 242 222. Split ratio: 70, 15, 15. — Number of texts in each split of our datasets (70%, 15%, 15%)

Our data has been re-annotated for our specific needs regarding client data and our Rule-Based Matching solution. We identified grooming as multiple topics occurring in the text that are easiest to detect in production and regarding our various data types. The combination of those topics may contain grooming risks. The annotation task has been defined as the following:

👉 We define grooming as a process to approach, persuade and engage a child/teen (a victim) in sexual activities by using Internet.

We have 6 labels: 5 labels for risk of grooming (sexual ; underage ; contact information ; approach ; other) and 1 label for non-risk of grooming (ok).

underage: texts where there might be a child or a teen involved in the conversation (ages below 21, age-related terms, age differences, about school or puberty stages)
sexual: texts about sex and sexual interests (sexual terms, body parts, sexual fantasies, sexual orientations, photo/nude sending, first-time discussions, sexual actions reframed as acceptable)
contact information: texts containing information about someone (full name, location, phone number) or contact requests (on other platforms, for instance)
approach: texts of people wanting or asking to meet someone in real life (haste of seeing someone, meeting organization)
other: another case, but the message is still suspicious and may contain grooming
ok: the text contains none of the above and doesn’t seems suspicious regarding grooming.

By analyzing the other texts once a first annotation was done, we found out they contained more explicit grooming cases. Another label should be added to those cases. For now, as we had very few texts with that label, we only used them to enrich our grooming Matcher (see our previous blog post).

As we started this project with those 6 labels, we did not design our classification task with the presence/absence of grooming in a binary way. Instead, we designed a multiclass prediction with the labels defined above.

Know that annotation has been a long iterative process but still contains errors regarding its complexity. Before model training, we end up with a Cohen Kappa of 0.86 on open-access data and 0.82 on client data. These Cohen Kappa has been computed on a representative sub-sample of 1 544 texts from open-access data and 1 467 texts from client data.

Configurations

We tried a simple BERT-base model to predict grooming with the above classes. As a reminder, we classify texts and not conversations here. We decided to train three different models on three datasets: the first one only contains client data, the second one only has open-access data, and the third one is a mix of both (we call it “total” here). We fine-tuned each model regarding the macro-f1 score and also the f1 scores of classes that were not ok. We have the same test set for each model containing client and open-access data.

Results

The results of each model can be found in the table below. The model containing client and open-access data has better scores and better predicts each label. However, the client model would be even more interesting for prediction on client data only.

CLIENT: f1 “ok”=0.96, f1 “sexual”=0.62, f1 “underage”=0.30, f1 “contact info”= 0.57, f1 “approach”=0.00 (8 texts in train). Macro avg.f1=0.61*. OPEN-ACCESS: f1 “ok”=0.95, f1 “sexual”=0.69, f1 “underage”=0.38, f1 “contact info”=0.39, f1 “approach”=0.33. Macro-avg.f1=0.55. TOTAL: f1 “ok”=0.97, f1 “sexual”=0.73, f1 “underage”=0.56, f1 “contact info”= 0.74, f1 “approach”=0.43. Macro avg.f1=0.69*. — BERT results on our client data and open-access data

*without approach. This label was underrepresented in val (N=1), and test (N=2) sets: we thus decided it could be deleted from client data. The original F1 score was 0.49.

By checking the results on each dataset (client or open-access), we understand that the client data model has difficulty generalizing on open-access data. Surprisingly, the opposite is false: the open-access data model generalizes better regarding the scores and mismatches review.

By checking 100 mismatches from each prediction to better understand each model, we found that 50% of the texts were annotation errors and not prediction errors. This is encouraging as the machine is learning the classes well at some point. Here are some examples of correct predictions with wrong labels:

Open-access model: y_true = ok, y_pred = contact info

plz send me contact no
kik me on my username x

Client model: y_true = ok, y_pred = contact info

I’m japanese but speak english and turkish <youtube link>
<facebook link>

Total model: y_true = ok, y_pred = sexual terms

you are hot
a bit nsa action with a hot lady

Total model: y_true = ok, y_pred = child presence

19/1.82/ …
… 16 years, teenwolf, witchcraft, rock, queen, nirvana

The scores show that every model has less difficulty on ok labels: The remaining annotation errors are mostly in this class. This class contains a bit of every other class. That is why it can easily be predicted over others. Therefore, the ok scores are not that representative of model performances compared to other labels.

Conclusion

We shared with you some experimental yet encouraging results of a BERT model for grooming risk detection. A BERT model appears to generalize well on topics related to grooming.

For future work, a better way to define the classification task would be to design a multilabel classification. For instance, predict the presence/absence of underage, sexual, contact information and approach predict ok in case every previous label is absent.

Another idea would be to binary predictgrooming/non-grooming texts by merging the existing labels in 2 classes defined in our annotation task. Unfortunately, we lack important contexts to do so because we are processing very short chat texts at the message level and not full conversations. Plus, it is difficult to say whether a message contains grooming or not, even for a human (explicit cases are very rare). Defining the major steps in the grooming phenomenon (age-related conversations, sexual approaches, …) is an adaptive way to overcome this complexity of automation.

To push predictions, think of adding lots of False Positive in your non-grooming texts, as in PAN12 and PJZC. Namely, you can add locations that are not approaches, sexual interests that are not related to grooming, endearment between BFFs, and age-related conversations that are not related to grooming in your negative label 0.