Voice-based Communities and Why It’s So Hard To Moderate Them

Published in

ACM CSCW

6 min readAug 29, 2019

This blog post summarizes a paper about the challenges of moderating voice-based online communities that will be presented at the 22nd ACM Conference on Computer-Supported Cooperative Work and Social Computing.

What comes to your mind when you think about online community moderation?

Probably something like this: Someone posts something inappropriate (say, hate speech). Then either a human moderator sees it and removes it, or an automated moderator instantly catches it and removes it.

Simple and straightforward, right? This process holds true for many online communities, and has long been part of people’s mental models. But how does a moderator, whether human or automated, moderate that hate speech when it is spoken, in a real-time voice chat rather than text that can be erased?

While many people, like you, have a good sense of how online community moderation works, which mostly involves moderators locating the problematic content, and then removing it and sometimes also punishing the poster, real-time voice has exposed new problems for moderation: there is no persistent, written record. The ephemeral nature of voice raises a number of new questions: How do moderators locate the content? How do moderators remove the content? How do moderators know who the speaker is? How do moderators know whether the rule breaking happened at all?

To answer these questions, my collaborators and I talked to 25 moderators from 16 different communities on Discord, a social platform where voice is a major mode of communication (called “voice channels”), and we heard new ways to break rules in voice, as well as many tactics and challenges of moderating real-time voice channels.

Discord is a social VoIP platform that hosts millions of communities.

How Do People Break Rules in Voice Channels?

Moderators told us that rule breaking was common in voice channels. While common text-based violations such as racial slurs do exist in voice, voice has also enabled new ways to disrupt communities. One of these ways, perhaps unsurprisingly, is to create disruptive noises:

I’ve had to step in because someone’s told me “Oh there’s a kid literally screaming down in the channel” … So I’ll hop in, and of course the kid will be screaming and he will be muted.

After calling typing in capitalization “yelling” for many years, here we are seeing people returning to actual yelling enabled by voice. While speaking too loudly is certainly unwelcome in many communities, volume is not the sole factor that makes something disrupting — sometimes it’s also about the content:

There is one time I had my Discord on speaker and I was just talking to a group of friends. … [Some random people] joined and they started playing loud porn. So my brother was in the house … and he heard the porn blasting out of my speakers and he was like, “Yo dude, why are you listening to that at full blast on speaker?”

Many Discord communities use voice channels not only for communication, but also as community jukeboxes that automatically play from crowdsourced playlists (called “music queues”), and moderators told us how some people would disrupt music queues:

Literally the most recent thing that happened. … Someone put [something disruptive] in the music queue and it was for like two hours of just extremely loud music.

In addition to these noises by individuals, we also heard stories of organized rule violations that moderators called “raids.” If you think one person sharing porn in voice channels is bad enough, well, it also happens on a larger scale:

There was one a few months ago where they were spamming porn over their mics and they all had profile pictures of the same girl in a pornographic pose. And there were maybe like 15 of them in the same voice chat.

The raids that moderators told us about involve not just 15 people, but up to thousands of bot accounts. However, moderators could only deal with these accounts one by one — there is currently no way to manage multiple accounts all at once. This restriction means that, not only do the moderators have to take on a significant amount of work managing raids, but also there is no way to prevent the other raiders from evading once they see one of them is punished.

How Are Moderators Dealing With Rule Breaking?

With all these new ways to break rules that are not possible in text-based communities, what are the moderators doing about them? Moderators told us they always give out warning before punishment, and when they do give out punishment, it is mostly based on hearsay and their first impressions on particular community members. While this doesn’t sound like the best way to moderate communities, moderators have no other choice but to rely on these unreliable signals — because there is no evidence of someone actually breaking the rules in real-time voice.

Voice channels just basically can’t be moderated. … The thing is in voice, there’s no record of it. So unless you actually heard it yourself, there’s no way to know if they really said it.

Moderators came up with different strategies of gathering evidence. The most straightforward one, of course, is to actually go inside the voice channel when they receive a report. This sounds like a reliable way of moderating voice channels, but many online communities are up 24/7 worldwide, and we can’t expect these volunteer moderators to be online all the time so that they can catch the rule-breakers in time. Even if they can be online constantly, it is still not guaranteed that they can identify the rule breaker when multiple people are talking at once.

Some moderators take witnesses at their words, and solely rely on their reports to gather evidence and identify rule-breakers, but it has one caveat: the witnesses have to be trustworthy. However, without concrete evidence, there is no way to tell if a witness is lying. While many moderators require multiple witnesses to deem a report credible, this simplistic way of determining credibility could facilitate a planned brigade (“dogpiling”) against a community member.

Finally, some moderators use recording to keep evidence. While this seems to be the best solution of all, there is one critical problem: recording without all parties’ consent is illegal in eleven states in the U.S. and some other countries. However, currently there is no consent process for voice channel recording in either the rules of the communities we looked at, or Discord’s Terms of Service (other than a term that requires users to comply with all local rules and laws). One moderator actually stated confidently that recording is definitely not against Discord’s Terms of Service, but given the actual complexity of the laws, this may or may not be true. Recording could therefore subject volunteer moderators to the risk of legal consequences unknowingly, as they can’t possibly know all the regulations in the world.

What Lessons Can We Learn?

Our findings reveal critical challenges brought by voice in online communities, tensions between laws and the real need to moderate communities, and design recommendations that can make voice-based communities better. For example:

Designers can implement systems that detect volumes that may be uncomfortable for humans, and temporarily mute these loud accounts while also reminding them to check their hardware settings (as loud noises sometimes might not be intentional).
For moderators to be able to record voice channels, platforms like Discord could either explicitly acquire consent at the platform level (like in Terms of Service, or through a popup dialog when a user joins a voice channel for the first time), or advise individual community moderators to explicitly acquire consent within their communities.

The need to moderate voice in online communities is only a recent phenomenon, and online communities are likely to continue to develop beyond text and voice. If another new technology comes around, what should moderators do then? For example, we are already seeing emerging social VR communities, and stories of users who sexually harass other users “physically” in VR. How would moderators acquire evidence of this type of VR sexual harassment? There are no easy answers to this question. It’s difficult to predict how people will abuse new technology, nor how rules may need to change to prevent such abuse. Therefore, it is important that moderators are willing to change rules or make new rules as their communities adopt new technology.

Some Final Notes…

This work is a collaboration with Charles Kiene from the University of Washington. During this collaboration, we also observed some communities that exist on both Discord and other platforms (such as Reddit), and the interesting ways in which the moderation teams of these communities handle the technological differences. This observation resulted in another paper also being published at CSCW 2019, well summarized in this blog post. I believe you will enjoy it as well.

Citation:

Jialun “Aaron” Jiang, Charles Kiene, Skyler Middler, Jed R. Brubaker, and Casey Fiesler. 2019. Moderation Challenges in Voice-based Online Communities on Discord. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 55 (November 2019), 23 pages. https://doi.org/10.1145/3359157

If you have questions or comments about this study, email Aaron Jiang at aaron [dot] jiang [at] colorado [dot] edu.

Voice-based Communities and Why It’s So Hard To Moderate Them

How Do People Break Rules in Voice Channels?

How Are Moderators Dealing With Rule Breaking?

What Lessons Can We Learn?

Some Final Notes…

Written by Aaron Jiang