Kids just have their own language

a blog about data, machine learning and comment moderation to keep Jeugdjournaal a safe platform for kids

Published in

NOS Digital

8 min readJan 26, 2022

In this blogpost we evaluate machine learning techniques and a rule-based system in order to help moderate comments and keep Jeugdjournaal — our news service aimed at children — a safe place.

NOS and Jeugdjournaal
NOS is an independent public media organisation in the Netherlands reporting on news and sports. We are a broadcaster by origin, and the last few decades we’ve witnessed how news is becoming a digital and a mobile service. We have dedicated teams of professionals to create digital services for several brands.

Jeugdjournaal is our news brand aimed at children aged 8 to 12. The daily broadcast informs hundreds of thousands of children about the news every day in classrooms and at home. The Jeugdjournaal website and app are playful products where kids read, watch and engage with the daily news.

Example of a poll on the Jeugdjournaal website

The Jeugdjournaal editorial team publishes a poll about the news every day. Kids can have their say by voting and commenting on the poll. Every day about 1500 comments are being submitted. Currently the editorial team moderates each of those comments manually to keep Jeugdjournaal a safe place. Moderation is an important and time-consuming process. With the growth of the Jeugdjournaal platform the moderation gets more labour-intensive.

The research project
To support the editorial team in the moderation process we researched what technology could be used to assist the editorial team in this process. By keeping track of posted comments along with the label assigned by the editorial team, we have built a data set which can be used to learn how to model editorial content moderation. For this purpose, we considered three approaches. First a rule-based system was tested, where rules inspired by the manual moderation process are used to decide whether a comment should be removed. Secondly, several machine learning (ML) models have been trained by formulating the task as a classification problem. Finally, a separate ML model has been trained to detect gibberish in comments.

The data set is a representation of the playful language kids use

A comment in our data set consists of a name (in the form of a pseudonym), a text and a label, where the label indicates whether the editorial team approved or removed the comment. We collected 2550 removed comments in one month, while there are 4 million approved comments over a period of 10 years. A balanced data set was constructed for the purpose of training our models, containing an equal number of positive and removed comments. A second data set was created containing all 4 million approved comments, which was used to identify where our models might make inaccurate predictions.

This data set is quite an interesting data set, because the way the comments are written deviates from regular texts, which is illustrated in Fig. 1. In this figure the 30 most frequent words not in the standard Dutch vocabulary [1] are shown on the x-axis and the frequency of these words within the 4 million comments data set is shown on the y-axis. One point of interest are the 6000 occurrences of ‘mischien’ and 4000 occurrences of ‘meschien’, which would be correctly spelled as ‘misschien’. Next to that, abbreviations such as ‘gwn’ which in full form would be written as ‘gewoon’ are abundant in the data. Other interesting cases are the words ‘jaa’ and ‘jaaaa’, which ordinarily would be written as ‘ja’. These typos, spelling errors, typical child-like ways of writing, abbreviations and elongations of words complicates the task of classification, because the performance of a machine learning model will depend on the frequency of the words it encounters in the data set. If many words are infrequent, the accuracy will be lower, because the model cannot learn good word representations. The example abbreviations/elongations and typos in Fig. 1 occur frequently so these might not cause too much trouble, but one can expect many other and even more unusual examples to occur in the data with low frequencies. Another complication caused by alternative spelling arises when using pre-trained word embeddings, which are often used in ML tasks to initialise word representations. Embeddings are trained on very large text corpora, which most-likely will not contain the alternatively spelled words, and so no pre-trained embedding will be available for such words.

Fig. 1: Most frequent words used in comments which are not in the standard Dutch vocabulary

Apart from alternative spellings, completely random combinations of characters are abundant in the data set. These gibberish comments add noise to our data, complicating and slowing down training of the ML models. We have tested an approach to estimate the probability of a comment containing gibberish, which is described in a following section.

Rule-Based System for tangible editorial guidelines

In the rule-based approach we aimed for a low number of false negatives rather than a high accuracy, by marking comments as inappropriate only when we are certain. The remaining inappropriate comments should be removed by the machine learning models. After talking to the editorial team to get an idea about their guidelines on content moderation, we first implemented some simple limits on the text length. Next, we formulated a rule that comments should not contain any form of contact details, which was implemented using regular expressions to recognize email addresses or mobile phone numbers. One could consider extending this to also remove references to social media usernames. Another regular expression-based rule was formulated to remove all comments with urls as it is too time consuming to check the contents of all urls.

We noticed the removed comments quite often contain HTML or code. HTML can be recognized relatively easily using the Python library BeautifulSoup [2]. Methods for separating natural language text from programmatic text were not tested in this work, but one could consider training a model to recognize programming languages on the basis of a corpus of scripts in many different programming languages.

Finally, a rule was implemented using a list of inappropriate words taken from [3], and removing all comments containing one or more of these words.

Gibberish Detector for things that don’t make sense

In addition to the rule system, an attempt has been made to identify and remove gibberish. To do this one can inspect the probability of a comment being gibberish as the product of probabilities of character combinations. As an example, consider the character sequence ‘qxz’: The probability of this being gibberish would be high because the probability of an x following a q is low. These character sequence probabilities can be obtained by counting frequencies in a large corpus. We obtained frequencies from our 4 million comment data set.

Machine Learning Models to learn from the editorial team

The list of inappropriate words used in the rule-based system could be curated and even extended to phrases to resolve word ambiguity depending on context. This would however be a labour-intensive method. Instead, machine learning could be used to automatically learn word and sentence representations and learn how this maps to editorial moderation on the basis of our data set. Such a machine learning model could be trained periodically, to learn new representations for newly arisen slang words. Next to that, machine learning models like Long-Short Term Memory (LSTM) might learn more complex inappropriateness in the comments such as bullying by taking into account word order. We have tried several methods such as a Naive Bayes Classifier, (Deep Continuous) Bag of Words (Deep CBOW) and LSTM.

We now automatically identify appropriate comments with 76% accuracy

The rule-based system was applied to the data set containing 5100 equally divided removed and approved comments. The system was able to capture 336 true negatives, while the number of false negatives was 8. Fig. 2 shows a bar plot of the frequency of true negatives versus the rule that was used, illustrating the effectiveness of each rule separately. It can be seen that the gibberish detector and inappropriate word list (bad_word) are the most effective in identifying inappropriate comments. The rules that remove comments with links or HTML also find quite some inappropriate comments.

To identify where our system makes predictions contrasting editorial decisions, we used a subset of 1 million approved comments and for each rule counted how often it was applied to remove a comment, which is shown in Fig. 3. Out of the 1,000,000 comments, our system marked 17,411 comments as inappropriate while they were originally approved by the editorial team. Fig. 3 shows the rule using the inappropriate word list is one of the rules where our system disagrees with the editorial team. This can be explained by the possibility that entries of the inappropriate word list may only be inappropriate when used in certain contexts. To avoid incorrect removal of comments, one should carefully curate this word list containing only words that will be inappropriate in any context. The gibberish detector also introduces a relatively large portion of the false negatives. One reason might be that gibberish is not always removed by the editorial team, because these are not harmful comments. We decided to use the gibberish detector to mark potentially inappropriate comments on the basis of the probability of the comment being gibberish, rather than using this as a strict rule. Fig. 3 shows links and HTML detection was applied with very few false negatives, so such rules could be implemented as strict rules that apply automatically.

Fig. 2: Frequency counts of true negatives found by rule-based system when applied to the balanced data set (left). Fig. 3: Frequency counts of false negatives in the 1 million approved comments data set (right).

Fig. 4: Confusion matrix for the Deep CBOW model

Out of all ML models that were tested, the Deep CBOW model gave the highest accuracy of 0,74. The confusion matrix for this model is shown in Fig. 4. Combing this model with a Naive Bayes classifier, increased the accuracy to 0,76. The data set used to train the models is quite small for deep learning architectures, and possibly the accuracy would improve when using a larger data set.

Conclusion and recommendations

We have experimented with several methods for automated content moderation on a website for children. A rule-based system was tested with a low number of false negatives, allowing us to directly mark 13% of the inappropriate comments. In addition, we have tested a gibberish detector and machine learning models, which can be used to assist our editorial team by marking potentially inappropriate comments.

We aim to implement this technology into our moderation tooling to assist the editorial team while moderating the comments submitted on the Jeugdjournaal platforms. More specifically, we aim to flag certain comments which are suspicious for the editorial team. The final call to either approve or decline a comment is a decision made by the Jeugdjournaal editorial team.

We hope you enjoyed reading this post. If you have any questions or feedback, please let us know in the comments!

References
[1] https://github.com/OpenTaal/opentaal-wordlist (used wordlist.txt)
[2] https://beautiful-soup-4.readthedocs.io/
[3] https://data.world/wordlists/dirty-naughty-obscene-and-otherwise-bad-words-in-dutch