Detecting Harassment in SocialVR with Natural Language Processing

The vast majority of people who regularly engage with SocialVR platforms will have encountered a harasser. Even among the biggest enthusiasts, myself included, there’s widespread admission that the troubling phenomenon exists and the main debate seems to be whether platforms are doing enough to counter harassing behavior. My goal in writing is not to answer this exact question, but to propose that SocialVR platforms work within their means to curate a safe and positive experience for their visitors. One resource immediately available to them are data: on their users, their patterns of behavior, and even the words that come out of their mouth.

My early experiences in SocialVR, conversations with platform staff, and contact with We Make Realities led me to make harassment in SocialVR the topic of my recently completed thesis at Bosphorus University, entitled A Framework for Understanding and Detecting Harassment in SocialVR. It represents both my great affection for SocialVR and proposed acts toward ensuring it will be a viable, enduring medium in the long term. Natural Language Processing as a means of targeting verbal harassment represents a segment of the completed project, but it will be the topic of this publication. Some of the following text will be pulled from the text while some of it will constitute a type of commentary.

Summer 2017 was the data collection period for the thesis work in which I recorded my session in SocialVR, including multiple popular platforms which will remain anonymous. Likewise, the usernames of the speakers are aliases. During this time, I collected 107 incidences of harassment that became part of the qualitative data. These sometimes came in sequences so, for example, one conversation may include a few independent incidences of harassment. The fact that I witnessed so many incidences of harassment over a three-month span does not represent the prominence of harassment in SocialVR since there was selection bias involved. In session, I stayed in populated areas with extroverted men, and I arranged to be in SocialVR most often late at night on weekends (PST) since, to my ears, it was the peak time for harassment. Here is a summary of a few cases regarding verbal harassment (Note: These cases aren’t necessarily harassment in every context, but they are considered harassment given the context in which they occurred):

1) Justin harasses Sam by remarking on his height as being ideal for the performance of oral sex. When Sam rejects participation in the taboo act and moves away, Justin demands that he return to the spot, allowing Justin to receive oral sex from him.

2) Chris attempts hip thrusts towards Frank, which would mime a sexual act. Steve had already withdrawn his consent both verbally and physically.

3) Jay discusses putting sexual excretions on Skylar’s body. He is prompted by the innocuous use of the word ‘come in Skylar’s conversation, but he responds using the sexual homophone.

4) John openly discusses alterations to his own genitalia and describes the acts that could be performed with that alteration.

5) Cory ridicules David for his Latino accent.

6) Biff is advocating two forms of violence against a racial minority within the dialogue. The first instance is against a racial group and the second instance is against a gender. The particular form of violence is irrelevant since they could be used interchangeably and still remain harassing statements. The approval of the listeners is also irrelevant because these harassing statements were said in a public forum and also belong to the sub-category hate speech.

7) Mac harasses Billy by attempting sexual contact with Billy after he had warned Mac that the contact was unwanted. The fact that Billy is laughing and seems to think it is funny is irrelevant since Billy does not reciprocate the sexual activity through his words or actions. It is possible that Billy laughed as a means of conflict avoidance or smoothing over social awkwardness.

8) The harasser introduces a sexual topic among people with whom he lacks adequate familiarity. The harasser suggests that the two people near him engage in sex. When the suggestion goes unanswered, he persists in the questioning. Since the pair of harassed people leave without speaking soon after hearing the harassing questions, it is evident that the questions were unwanted.

The transcripts for these incidences were not included in the thesis due to privacy fears, but the transcribed words and behaviors were utilized in creating strategies and tools for detecting harassment. The data from the transcriptions were added to known forms of verbal harassment taken from the relevant literature on verbal abuse in social media platforms. Thus, a dictionary of lexical items and sentence constructions was created within an NLP program in Python that would analyze the speech of users for harassing content. Here is an example of a few categories (Note: There is reference to a scoring device in some categories. While explained in the thesis, it isn’t explained completely in this publication):

1) Singular Lexical Items: The first level of analysis is the quality of individual words in the discourse and the four categories that are considered include: swear words, controversial topics, abusive terms, and taboo words. The discussion of controversial subjects and swearing are not explicitly discouraged, but their overuse may be a sign of harassing behavior, especially in combination with other categories. Abusive terms are always problematic when used sincerely against another user and taboo words are always considered unacceptable in any public setting, like a common area in SocialVR. The use of any categories add to the total harassment score and the use of three or more in combination can lead to an automatic harassment classification (see figure). The only reason taboo words do not result in an immediate harassment classification is the chance for incorrect transcription by the speech recognizer. Specifically, some pronunciations of the word can’t can be misunderstood as the taboo word c***.

Example Diagram of NLP for Lexical Items (Note: Numbers are placeholders made to represent the platforms’ values, not the most effective scores at harassment detection)

2) Harassing Imperatives and Abusive N-Grams: There are common bi-grams and tri-grams that may be used to insult someone or demand a sexual act be performed on them. Some of these n-grams contain swear words, but other times the components of an n-gram, such as blow or jerk, may be completely inoffensive when used individually. When a higher number of lexical items is used, the evidence for a propensity towards harassment becomes stronger, which justifies higher scores being added to the total score. There are also high-frequency, abusive n-grams which, as dictated by the micro-score, may lead to a positive harassment classification if repeated too many times in too short a time span.

3) Hate Speech in Participle Constructions: This is also a violently themed section, but the target shifts from the individual to people grouped by race, gender, nationalities, political affiliations, sexuality, and religions. The harasser does not explicitly say they will perform the violent act, but only that the group, for example, ‘Should Be’ or ‘Must Be’ harmed in a particular way. Since this is considered hate speech and, furthermore, extremely detrimental to the life of a SocialVR platform, repeated infractions of this type would quickly be classified as harassment by using the section’s micro-score.

4) Promoting Self-Directed Harm: This section is an extension of the previous few sections in its attempt to detect harassment against a group. The lexical cues include a reflexive pronoun preceded proximally by one of the aforementioned groups. This section also serves to detect directives towards self-harm and suicide where the target is an individual user, which is a grave problem in social media that SocialVR platforms would undoubtedly not wish to see replicated in their domain (Mukhra, 2017). Having a user harm themselves after an unfortunate encounter with another user would be greatly detrimental for both the targeted user and the reputation of the platform.

5) Ejaculation and Prepositional Phrases: There are some sex acts considered especially vulgar and lexically particular, which require added detail to detect in conversation. Synonyms for the verb ejaculate are covered in this section of the program and they are used with prepositional phrases. The challenge of this section is highly sexual verb c** and its frequently used homonym come, so the program seeks to limit the number of false positives by targeting the prepositional phrases including parts of the body that are used with the verb. Using the non-sexual verb come with the prepositional phrases is semantically incongruous, allowing the program to give a harassment score with some confidence. Scores may be higher or lower depending on how sexually suggestive the part of the body in the prepositional phrase is. A micro-score is included for this section and exceeding its threshold will result in a positive harassment classification.

6) Protestations: This section is sourced in users who are potentially being harassed and the resulting score from their protestation may be added to either the nearest players or all users presently in the room. Demanding someone stop what they are doing could be a sign that they are being harassed, but it could also be a normal part of their conversation. For this reason, the score limit is kept low, which will prevent a high number of false positives, and also force true positives, whose score is near the threshold, over it. The current program takes a second, subsequent recording for the sake of including protestations, but this would be done differently in a SocialVR platform.

The lexically based harassment program was written with the expectation that lists of applicable lexical items would be added as more data is analyzed. Since user data was still not widely available from the platform, the testing data had to be collected according to methods similar to those of the qualitative analysis. I again entered SocialVR platforms and kept a record of the data that was overheard. The main difference between the data is the absence of context in the testing data as opposed to the qualitative data. Most, but not all, of the expressions taken from the conversation during the collection of testing data were not harassment, but they are still included because they might have been considered harassment if they had appeared in a less familiar and consenting social environment.

Each expression that could be considered harassing in the wrong context was transcribed and given a score by the NLP program written for this project. Additionally, each of the users were given a harassment score for their behavior during a session. Since some scores grow exponentially with each new harassing statement, the individual statements could not just be summed, but had to be read together. Of the 50 harassing statements, only 13 of them were given a harassment score for a total of 27 points. Using the testing data, new lexical items were added to lists in the program, new noun phrases were added, and causative structures were included for sexually themed statements. The phrases and users were tested again, and the adjusted sum was 45 points from the 20 harassing phrases that were given a score, increasing the true positive rate from 26% to 40%. This means that more than half of the tested phrases were not scored, but they were too ambiguous, context specific, or euphemistic to be captured. Any attempt to include them might easily lead to false positives in further tests. The table below gives the most relevant examples from the testing data, revealing the original NLP score, resulting additions to the lexicon and sentence structures, test scores for the phrases after changes to the program, and the classification status before and after the changes.

NLP Program Test: Expressions that are scored are considered True Positives (TP). There are no False Positives. Non-scored incidences of harassment are False Negatives (FN) and non-scored non-harassment items are True Negatives (TN). The resulting actions represent additions to the NLP code.

Before enacting an automated process of harassment classification which will also take unsupervised action against the harassers, the program should be tested by running it on current users. Their discourse may be recorded in text files, highlighting the language thought to be vulgar or hateful. The surrounding context in their conversation may also teach keywords and structures determinate in the detection of harassment, which may be added to the NLP program. Data on the offending users may also be used to improve the user profiling scores if they are used. Scoring simulations may also be run on users, which will allow the platform to informedly modify its scoring to eliminate false positives and false negatives. When the program has proven its ability to run unsupervised, it may be implemented in the platform.

For the sake of data protection, SocialVR platforms should strongly consider anonymizing text data either by assigning a user number or encrypting the names of users. They might also consider deleting the text files after a pre-determined period. The text files should be searchable according to time and user in the eventuality that someone lodges a harassment complaint against another person and a review of the case is required. Mention of the SocialVR platform’s right to keep a record of events taking place in their platform should be expressed in the terms of service, but it should not be explicitly mentioned under any other circumstances.

There are challenges found in this program which are familiar to all natural language processing, especially sentiment analysis (Srivastava, 2017). From the text alone, it may be difficult to determine whether users are being combative or joking. Users who are playing a game may be invoking trash talk, in which players will insult each other and their abilities, but the decision to do so might be mutual and lighthearted in this context. It is important raise or remove thresholds between friends since they are more likely to speak with familiarity and that could be mistaken for harassment. Video game culture also presents a similar problem since many of the participants, speaking in the first person, will speak about violent acts, which would be horrific outside of that context.

Another challenge comes from mimicry or repetition of harassing statements. It has been found in the qualitative data that harassed users will sometimes repeat harassing statement to express shock, to direct the statement back at the harasser, or to report the harassing statement to a neutral third-party user. A positive classification that is false would be especially undesirable in these cases since the harassed user is being wronged by both the harassing user and the SocialVR platform. This makes an especially strong case for maintaining a supervised program until collecting adequate data.

The lack of human error in speech-to-text processing is one advantage it has over text processing. Harassers who do not wish their speech to be filtered out can easily manipulate text to make it difficult for natural language processors to comprehend (Srivastava, 2017). They may do this through using phonetic misspellings for vulgar language, approximate spelling, and swapping similar looking characters. The same techniques cannot as easily be done when users’ speech is faithfully transcribed.


This represents a section of my thesis work and a few additional, narrowly focused articles from this content are planned. I included a few of the in-text citations hoping that people will check them out. If you wish to follow my ongoing work in SocialVR, Twitter isn’t bad, same username. Or contact me directly if you have questions. I love SocialVR and it’s one of my favorite things to talk about :)