Screenshot from Facebook’s metaverse announcement video with a robot avatar wearing a poker hat and two human avatars playing cards. There is also a projection of two women in real life in a screen.

Challenges of Moderating the Metaverse

On Oct. 28, Mark Zuckerberg announced Facebook would rename itself to Meta to emphasize the company’s shift toward the “metaverse,” a VR and AR social media environment. Soon after, the Financial Times reported on an internal memo by Meta’s CTO Andrew Bosworth in which Bosworth mentions they are aspiring to “almost Disney levels of safety” in its metaverse. However, he also notes that moderating behavior “at any meaningful scale is practically impossible,” alluding to an adage from Techdirt founder Mike Masnick.

In The Atlantic, professor Ethan Zuckerman asks of Meta:

“How will a company that can block only 6 percent of Arabic-language hate content deal with dangerous speech when it’s worn on an avatar’s T-shirt or revealed at the end of a virtual fireworks display?”

Likewise, in an NPR interview, Audie Cornish asked a version of Ethan Zuckerman’s rhetorical question to Meta’s Vice President of metaverse, Vishal Shah:

“What I’m saying is if you can’t handle the comments on Instagram, how can you handle the T-shirt that has hate speech on it in the metaverse? How can you handle the hate rally that might happen in the metaverse?

Shah deflects the question, primarily discussing the policy challenges of balancing “freedom of speech and freedom of expression” while briefly mentioning “the ability to detect some of those things, the technology to find them — we’ve invested for years.” Facebook has been widely criticized for policy decision, such as its definition of hate speech and special treatment for celebrities and politicians, all of which will continue to be issues in the metaverse. However, through the “hate speech on a t-shirt” example, both Zuckerman and Cornish point out that content will be more technologically challenging to moderate in the metaverse than on Facebook or Instagram due to the nature of the platform itself. By diverting Cornish’s moderation question toward policy, Shah avoided criticism of the technological challenges of moderating the metaverse, challenges that undermine the metaverse’s foundation. The best policy in the world can’t be enforced in the metaverse if its content can’t be understood.

Harm in the Metaverse

As alluded to by Zuckerman and Cornish, Meta currently struggles to moderate its social media platforms. Meta admitted in 2018 it was used to “incite offline violence” during the Rohingya genocide and internal documents leaked by Frances Haugen suggest Facebook is currently being used to incite violence in Ethiopia. The company also doesn’t have a good track-record moderating live content, allowing the 2019 Christchurch massacre to be streamed on Facebook Live for over 17 minutes.

Meta also heavily relies on AI systems for moderating hate speech. For instance, Facebook boasts over 94% of the hate speech it takes down in text, images, and videos is detected proactively (before anyone reports the content) by AI systems. Although internal communications leaked by Frances Haugen suggest the hate speech taken down only represents a small fraction of the total hate speech on Facebook, moderating Facebook posts is much easier than moderating speech in the metaverse. In addition to moderating hate speech or harassment in text, images, and videos, hate can be communicated in the metaverse through mediums like voice, avatar actions, and the design of elements in the environment. This content in the metaverse will be both more difficult for AI systems to proactively understand than Facebook post and automated systems will have less time to react.

Prior research from Oculus on harassment in social VR identified three types of harassment: verbal, physical, and environmental. The authors note verbal harassment includes things like hate speech and sexual language. Physical harassment includes unwanted touching, obstructing movement, and making sexual gestures. Lastly, environmental harassment can include displaying sexual or violent content, drawing sexual images, or throwing objects. Similarly, in a report on hate in social VR, the Anti-Defamation League (ADL) documented reports of virtual groping, drawings of penises, and avatars designed to look like Adolf Hitler.

While the verbal, physical, and environmental typology helps understand harassment, it doesn’t directly map to what makes moderating the metaverse different from Facebook or Instagram. Instead, we need to focus on the medium (E.g. voice chat, avatar movement, objects in the environment, etc.) through which harm can take place and how one is exposed to this content (live or asynchronous).

Voice

Voice is notoriously difficult to moderate due to both the ephemeral nature of the medium and the myriad ways one can be harmed through audio. An AI system can take down a Facebook post containing hate speech minutes or hours after it is created and it is possible no one will have seen the post. On the other hand, audio chats are live and research suggests in order to have a natural conversation there can only be a few hundred milliseconds between when something is said and it is heard. This makes complex algorithmic voice moderation intractable. For instance, Microsoft is still unable to filter keywords from live audio in Xbox Live’s party chat. Also, even if one could proactively filter keywords from voice chat, systems that rely on translating speech to text won’t handle other audio harms, such as sexual noises, racist accents, or other upsetting sounds. Moreover, keyword filtering systems may harm the marginalized communities they are intended to help because, for example, there isn’t a way to distinguish between an LGBTQ+ person using the word “queer” and a homophobic troll using the word “queer.” Harm through voice is both more complex to understand than text, and it’s unclear how one could algorithmically moderate a live conversation.

Avatar Actions

It’s also unlikely that Meta will be able to use AI to moderate avatar actions in real-time. As pointed out by the Anti-Defamation League, hate communicated via avatar actions isn’t a new problem. Starting in 2006, 4chan users raided Habbo Hotel, a two dimensional virtual world aimed at children. During the raid and subsequent recreations, people used Black avatars, resembling a form of digital Blackface, to block people from entering the pool and announced “Pool’s closed due to AIDS.” Raiders also arranged their avatars to create a swastika.

Voice + Avatar Actions

To see how voice and avatar actions can combine to create new forms of harm, one can look at the racist meme “Ugandan Knuckles,” which took VRChat by storm in 2018. This meme involved groups with avatars resembling Knuckles, a character from Sonic the Hedgehog, swarming other users and speaking in a fake, racist “Ugandan” accent. It seems highly unlikely that an AI system would be able to proactively moderate this complex combination of avatar actions and voice.

Objects

The types of harm one may experience through objects can be divided into two subtypes: inherently harmful objects and those that aren’t inherently harmful but can be configured to create harm.

Inherently Harmful Objects. One could think of a t-shirt containing hate speech or an avatar that resembles Hitler as inherently harmful. Likewise, according to Vishal Shah, users in the metaverse will be able to “display [NFTs] in their digital spaces,” and these NFTs could be inherently harmful if they contain violent imagery or hate speech. Unlike voice or avatar actions, these elements could be moderated proactively using a combination of human and AI tools. For instance, when a user creates a new object, this new NFT, t-shirt, avatar design, etc., could undergo a review process before it can be shared, during which a combination of human and AI moderators could validate the element.

Object Interaction or Configuration. On the other hand, seemingly benign objects can cause harm through avatar actions, such as throwing an object. These elements can also cause harm through their configuration. For example, researchers have found recreations of Nazi concentration camps on Roblox and Minecraft and Anti-Defamation League researcher Daniel Kelley has found multiple recreations of mass shootings on Roblox. One can also imagine combining multiple seemingly benign articles of clothing, which, when combined, allow one to communicate hate speech. Although Meta can, and should, moderate inherently harmful objects proactively, moderating object interactions or configurations pose similar challenges to moderating avatar actions.

Augmented Reality

Thus far, our discussions of harm have focused on well-studied moderation issues in VR and similar online worlds. However, Meta’s announcement video shows a metaverse which blends elements from VR with elements from the “real world” via AR, introducing even more avenues for harm. The metaverse announcement video alone demonstrates multiple ways in which combining AR and VR will be difficult to moderate.

Live Video Streaming. In the annoucment video, Mark Zuckerberg introduces AR in the metaverse by video calling people into a VR poker game. Although this interaction resembles a typical video call, it's unclear how large of an audience with which one will be able to video call. If one can broadcast publicly then, much like how people have used Facebook to livestream mass murder, this feature could allow one to broadcast harmful offline events into the metaverse. According to leaked internal documents, at the time of the Christchurch Massacre, Facebook could only detect violent violations of its terms of service “after 5 minutes of broadcast” but can now do so after about 12 seconds. While this response time is certainly an improvement, it also demonstrates the challenges of moderating live video streams, which Meta will continue to grapple with in the metaverse.

Objects. The metaverse announcement video also shows the person Mark video-calls sending AR art into VR and, according to Zuckerberg in the video, “you will be able to take your [items from VR] and project them into the physical world as holograms in augmented reality.” Like our discussion of objects in VR earlier, Meta may be able to moderate inherently harmful objects. However, AR adds an additional layer of complexity to the harm which could come from the configuration of virtual objects by combing them with the physical world. For instance, the same people recreating concentration camps or mass shootings in virtual worlds today may, through augmented reality, also try to desecrate physical sites where these events took place. While geofencing may handle this extreme case, one could still place AR hate speech in physical locations to harm communities, such as virtually drawing a swastika on a synagogue or arranging benign virtual objects like a cross and a fire emoji to create an AR cross burning in someone’s yard. This type of physically situated harm is also particularly challenging to proactively moderate using AI because understanding it requires vast domain knowledge of hyperlocal contexts.

All Hope Lost?

Although the metaverse is going to be a particularly challenging moderation landscape, hope (probably?) isn’t lost. There are a number ways social media companies could try to mitigate harms in the metaverse.

Support Retroactive Moderation. As mentioned earlier, Meta heavily relies on AI for proactive content moderation, but one likely can’t proactively moderate harm through audio, avatar actions or configurations of benign objects. Instead, users will need to report these harms retroactively. Meta has taken some steps to support retroactive moderation by recording a sliding window of time in Horizon Worlds locally on each user’s device, which only leaves the device when a report is made. However, Meta should also invest in tools for supporting VR users moderating their own spaces.

Proactive VR and AR Object Moderation. Configurations of benign objects into harmful arrangements are difficult to moderate proactively. However, Meta should be able to proactively moderate inherently harmful objects, such as a t-shirt with hate speech. Before users can display a new object to others first time, it could undergo a series of reviews from a combination of AI and human moderators. This review process would slow down the design and creation of new objects in the metaverse. However, this delay would give AI and human moderators time to ensure an object isn’t inherently harmful before it enters the metaverse.

Broadcast Delays. Meta could partially address the technological challenges of moderating live audio and video by introducing broadcasting delays when streaming to audiences greater than a specific size. This intervention could allow for real-time video chatting in small groups while also giving AI moderation systems sufficient time to react to larger discussions. Such delays would limit the interactions one can have. For example, influencers may not be able to interact with audience members like they currently do on Instagram Live. However, broadcast delays may be justified if they proactively prevent someone from live-streaming a mass murder or prevent a neo-Nazi rally. This strategy is also not new, having been used to moderate live events like award shows and sporting events for years.

Hate finds a way. In 2016, neo-Nazis began surrounding the names of those they believed to be Jewish on Twitter in triple parentheses, or “((( ))).” In turn, some Jewish Twitter users reclaimed the triple parentheses as a way to declare their Jewish identity. It wouldn’t make sense to simply ban triple parentheses. There will always be a way to express hate, but social media companies have a social obligation moderate the most egregious cases. The interventions I’ve proposed won’t prevent the triple parentheses VR equivalent, but that’s not their purpose. Hate will find a way in the metaverse, but social media companies can still take steps to diminish it.

Caveats. These interventions are merely the musings of one graduate student who studies social media. Before any of these are implemented, one should engage directly with marginalized communities to understand how these interventions could both help and harm the communities they are meant to support. For instance, blocklists meant to mitigate harm toward the LGBTQ+ community can also limit LGBTQ+ people’s speech. Researchers should also work with these communities to understand moderation goals and design moderation strategies.

Better Futures

Although I’ve primarily focused on Meta, they are certainly not the only company that needs to carefully consider how to moderate the metaverse: Microsoft and Apple are investing in the metaverse. Also, social AR and VR technology has great potential to be used for good. Researchers since the 1990s have extolled virtual reality for its potential to support identity work, but also, since the early 90s, people have raised concerns over content moderation in virtual worlds. These tensions are neither new nor are they going away anytime soon. Tech companies can address some of these challenges, but doing so requires thoughtfully engaging with and centering marginalized communities in designing these technologies and seriously considering harm reduction at the earliest stages in the design process, even if doing so impedes Big Tech’s fetishization of virality, scale, and exponential growth. In other words, move slow and don’t break things.

Other Resources on Moderation

For deeper discussions of moderation from people who spend far more time thinking about these things than I do, please take a look at work from scholars like Joseph Seering, Jialun Aaron Jiang, and Shagun Jhaver.

Disclaimer

I’m currently a PhD student at Carnegie Mellon, where I research social media and marginalized communities. However, I interned at Facebook in the summers of 2020 and 2021. Despite my prior employment by Facebook, this article only discusses publicly available information and represents my opinions as a social media researcher and not as a former intern.