Why Offensive Message Identification is Hard?

Sanjaya Wijeratne
Holler Developers
Published in
10 min readJan 19, 2022

AI Research at Holler Technologies

Image Source — https://www.psychologicalscience.org/observer/the-science-of-swearing

Today, we live in a society where the Internet and online communication have become a major part of our everyday life. Especially with the recent COVID-19 pandemic, the number of hours we spend in front of a screen has increased dramatically. Early surveys reported nearly 70% increase in the Internet use during the pandemic, out of which, half of the time was spent on social media [Beech, 2020].

People rely on social media to share and receive information including the latest news, communicate with their networks including friends and family, learn new things, develop their interests, and be entertained among others. In 2019, it was reported that more than 70% of Americans use social media sites everyday [Pew Research, 2019] while YouTube and Facebook being the most popular ones among them [Pew Research, 2021]. Majority of the users access these sites on a daily basis, however, the sentiment associated with social media use still tend to be negative [Pew Research, 2020]. Research suggests that sharing of misinformation and online hate and harassment in social media are the two main reasons why people have a negative sentiment towards social media websites. Recent studies show that roughly four-in-ten Americans have experienced online harassment on these sites [Bertazzo, 2021] and 79% of the social media users believe that the social media companies are doing a fair or poor job at addressing online harassment on their platforms [Pew Research, 2021]. In this article, we will look at the challenges in offensive language identification, which is the language often used in online harassment.

Offensive Language and Offensive Language Identification

Image Source — https://tinyurl.com/27w8cydv

Offensive language or profanity is the use of certain words and phrases that are considered by some to be rude, impolite, offensive, obscene, or insulting [Wang et al., 2014]. People also refer to offensive language use as swearing, cursing, or using bad language. The use of offensive language occurs more often than we think in our day-to-day conversations. [Mehl and Pennebaker, 2003] reported that 0.5% to 0.7% of all words we speak are curse words, given that 1% of all the words we speak are first-person plural pronouns (e.g., we, us, our). In another study, [Jay, 2009] found 70 curse words in an 11,609-word tape-recorded conversations of elementary school students and college students. Use of offensive language is common across social media platforms too. By analyzing 51 million tweets posted by 14 million Twitter users, [Wang et al., 2014] reported that curse words occurred at a rate of 1.15% on Twitter, and 7.73% of all tweets in their dataset contained curse words. Similar studies done by analyzing the language used in public and private Facebook posts report that the most common offensive words are used millions of times in Facebook posts, everyday [WeRSM, 2013].

Offensive language identification in social media is the process of identifying and detecting the user generated offensive content or comments that are hateful in nature that target an individual or a group [Zampieri et al., 2019]. Recent research studies the problem of offensive language identification via a three way schema that looks at (i). Offensive Language Detection where the focus is on detecting whether a social media post contains any form of offensive content (aggressive, sexist, racist, etc.) or not, (ii). Categorization of Offensive Language which focuses on detecting whether an offensive message contains a targeted insult (towards an individual or a group) or not, and (iii). Offensive Language Target Identification where the focus is on detecting the targeted individual, group, or entity (a business organization, an event etc.) present in an offensive message [Zampieri et al., 2019a]. Online offensive messages could be a result of many social media activities that link to online harassment, cyber bullying, expressing hate, aggression, racism, sexism and research has been conducted covering many such online activities where the identification of offensive language or words lies at the heart of solving them [Hate Speech Data]. However, the problems such as detecting the relationship between the sender and the receiver of an offensive message exchange and identifying offensive messages that are associated with non-textual elements present in social media messages such as emoji or stickers are not extensively studied [Kirk et al., 2021]. We will go over some of those issues later in this article.

Types of Offensive Messages

In this section, we will briefly go over some of the common offensive message types we find on social media platforms and in major datasets available for researchers who work on offensive message identification and filtering. The following is a subset of the offensive message types identified by Founta et al., in their work [Founta et al., 2018].

  1. Offensive Language — Profanity, strongly impolite, rude, or vulgar language expressed with fighting or hurtful words in order to insult a targeted individual or group.
  2. Abusive Language — Any strongly impolite, rude, or hurtful language using profanity, that can show a debasement of someone or something, or show intense emotion.
  3. Hate Speech — Language used to express hatred towards a targeted individual or group, or is intended to be derogatory, to humiliate, or to insult the members of the group, on the basis of attributes such as race, religion, ethnic origin, sexual orientation, disability, or gender.
  4. Aggressive Speech — Overt, angry and often violent social interaction delivered via electronic means, with the intention of inflicting damage or other unpleasantness upon another individual or group of people, who perceive such acts as derogatory, harmful, or unwanted.

As you may see, the above common offensive message type definitions overlap in terms of the language/offensive words used, which makes it difficult to generate computer models that clearly distinguish such messages. In the next section, we will look at such challenges.

What Makes it Difficult to Build Computer Models to Identify Offensive Messages?

Offensive message identification and filtering is a difficult task due to many reasons. In this section, we briefly go over the challenges in offensive message identification that are related to training data collection, overlapping definitions in offensive message types, issues arise due to the relationship between the sender and the receiver of an offensive message exchange, and offensive messages that are associated with non-textual elements such as emoji and stickers which could change the semantics expressed in the message text making it offensive.

  1. Training Data Collection — To train any supervised training model, we need access to labeled training data. In the context of offensive message filtering, we have to have offensive messages posted on social media websites in our labeled training dataset. Most of the social media websites use human moderators to weed out the offensive messages posted on their platforms. Similarly, the majority of the social media sites let the users of their sites to report any offensive content to the moderators, so that the moderators can remove them from their websites. Even though this seems an inefficient way of removing the offensive messages, eventually they will be removed from the social media websites. Therefore, searching for social media websites for offensive messages can be harder than one would think. Moreover, many social media websites don’t let the raw message text to be shared with others for research purposes, but the message IDs. More often than not, retrieving labeled datasets used by other researchers using the message IDs results in missing data due to the fact that the removal of offensive messages by site moderators by the time of message retrieval. Thus, collecting a labeled offensive message training dataset could be a major challenge faced by researchers. For a list of already published offensive message datasets that are commonly used in research studies, please visit [Hate Speech Data].
  2. Class Imbalance and the Language Used in Training Data — Even though 0.5% to 0.7% of all words we speak are curse words [Mehl and Pennebaker, 2003], non-cursing words still largely outnumber curse words in labeled datasets. For example, if we used the dataset collected by [Wang et al., 2014] in their experiments to learn a binary classifier that labels whether a message is offensive or not, our offensive dataset would still consist of 7.73% of all tweets in the dataset. This means, a little more than 92% of our training data would still be non-offensive text, which leaves us with an imbalanced dataset. This problem becomes even worse if we try to introduce fine-grained offensive message types such as Racist, Sexist, Aggressive messages etc. into our classifier instead of an umbrella class to represent all offensive messages. For example, if we wanted to develop a multi-class classifier that labels a message as Racist, Sexist, Aggressive, or Non-offensive using [Wang et al., 2014]’s dataset, the 7.73% of offensive tweets will be further divided into Racist, Sexist, Aggressive classes, making the dataset further imbalanced. Similarly, the same set of curse words are used across the above fine-grained offensive message types, making it challenging to develop multi-class classifiers that rely only on word-level linguistic features. Such simple multi-class models could be helpful when one wants to deploy their offensive classifier on resource-constrained devices such as mobile phones where the same classification techniques and linguistic features one could use on a server-side environment might not be available on client-side (e.g., The TensorFlow-Lite machine learning model deployment platform for resource-constrained devices doesn’t support all TensorFlow functionality [see here]).
  3. Sender-receiver Relationship of an Offensive Message Exchange — Not all messages that contain offensive words are hateful [Jay, 1992]. For example, close friends often use curse words in their social media messages targeted at each other even though they don’t wish to harass their close friends. A majority of offensive message identification models that are based on word-level features rely on curse word lists [Xu, 2010] or they want the classifier model to automatically learn the relationships among curse words and the other words appearing with curse words to learn features to classify offensive messages. Therefore, not knowing the relationship between a sender and a receiver in an offensive message exchange could result in classifying messages that are not meant to harass the receiver as offensive. This problem is very difficult to address, especially using social media datasets as learning the relationship between two random individuals that exchange offensive messages on a social media website is a very difficult problem. Due to this reason, there has not been much research conducted in this area and it continues to exist as an open challenge for offensive message identification.
  4. Issues with the Presence of Visual Communication Elements in Textual Data — Visual communication is a powerful tool for expressing non-verbal cues such as gestures and emotions which are otherwise difficult to express by only using words in online messaging [Holler, 2021]. Ranging from emoticons [Wikipedia] to Facebook’s latest addition to the spectrum of the visual elements, Soundmoji [Majcher, 2021], these pictorial characters give great expressive power to the users of online social media websites and messaging platforms. The presence of visual elements in social media messages has increased dramatically in recent years and become an important part of text classification systems [Emoji Workshop, 2022]. Thus, it is important to consider visual communication elements when building offensive message identification systems that can process the latest messages posted on social media websites. Past research has shown that visual communication elements such as emoji can take different meanings based on the context of their use [Wijeratne, 2017]. Oftentimes, certain emoji can be associated with offensive meanings as well, making them a tool for social network users to express hate towards others [Kirk et al., 2021]. A recent example for using emoji to express hate would be the racist social media comments received by Bukayo Saka, Jadon Sancho, and Marcus Rashford following England’s defeat in the Euro 2020 soccer final [Jamieson, 2020]. The angry social media users across all major social media platforms used emoji such as 🐒, 🐵, 🍌, and 🍉 to throw racial insults towards the players of color who play for the English Football team. Handling nuances such as emoji meanings in offensive message identification is still at its infancy. A recent work by [Kirk et al., 2021] was the first one to investigate this issue. They created a labeled training dataset consisting of offensive messages with emoji and showed that the current offensive message identification systems can be improved by using emoji presented in the context of a social media message. Even though their initial results are promising, there’s much more work needed to be done in this area. For example, there are other non-verbal communicative elements such as stickers and GIFs that could cause similar issues. For example, an image search tool developed by Google incorrectly labeled photos of black people as gorillas [BBC, 2015]. Similar references could be used as in the previous case with emoji but using GIFs and stickers to throw racist messages at people of color. Thus, more research needs to be done in this area to learn the relationships between the targets involved in the offensive message exchange and the visual elements used.
Image Source — https://tinyurl.com/mr4b35sp

Conclusion

In this article, we looked at why offensive message identification is a difficult problem to solve using computer models. We discussed the negative sentiment associated with social media in general due to the presence of hate in social media websites, we defined what is an offensive message and what is meant by offensive message identification by explaining a three way schema proposed in recent research, we looked at different types of offensive messages found on social media sites, and discussed why it is difficult to build computer models that correctly distinguish them. We also shed light on research areas related to offensive message filtering that have not been extensively studied. We hope this article motivates the reader to take on the less-worked on research areas related to offensive message identification and filtering.

--

--

Sanjaya Wijeratne
Holler Developers

Applied Scientist at Nexon America, working on NLP/LLMs. Creator of EmojiNet; Co-organizer of the Emoji Workshop. https://www.linkedin.com/in/sanjayawijeratne/