Helping Authors Understand Toxicity, One Comment at a Time

Published in

Jigsaw

4 min readSep 21, 2020

Our team at Jigsaw uses artificial intelligence to spot toxicity online, and part of our work focuses on how to make that information more useful to the platforms and publishers that need it to host better conversations. Sometimes that means helping platforms moderate conversations more effectively, but we’ve also been exploring how we can help the users — the people actually writing the comment or post — better understand the impact of their words.

We all understand how toxicity online makes the internet a less pleasant place. But the truth is, many toxic comments are not the work of professional trolls or even people deliberately trying to derail a conversation. Independent research also points to how some people regret posting toxic comments in hindsight. A study we did with Wikipedia in 2018 suggested that a significant portion of toxicity comes from people who do not have a history of posting offensive content.

If a significant portion of toxic content was just people having a bad day or a moment of tactlessness, we wanted to know if there was a way for us to harness the power of Perspective to provide real-time feedback to people as they were writing the comment — just a little nudge for people to consider framing their comments in a less harmful way. Would that extra moment of consideration make any difference?

Several of our partners using Perspective API added what we call “authorship feedback” into their comment systems, and we worked together to study how this feature affects conversations on their platforms. The idea is simple: they use Perspective’s machine learning models to spot potentially toxic contributions and the platform provides a signal to the author right as they’re writing the comment. (This required carefully crafting the feedback message: eg. less-than-encouraging messages can have completely the opposite effect.) By suggesting to the user that their comment might violate community guidelines, they have an extra few seconds to consider adjusting their language.

**Authorship feedback message shown in red below a toxic comment on one of the websites supported by Coral.**

Here’s what we learned from those studies.

Coral by Vox Media, a prominent commenting platform used by media organizations around the world has been integrating authorship feedback into their “Talk” platform since 2017. Commenters submit their comment, and if they used toxic language or made personal attacks, a message appears asking if they want to rephrase (shown above). The message was carefully designed to avoid active accusations, and focus on the language, and not the person. The feature is also designed to reduce the moderation load by encouraging commenters to improve their behavior without the need for human intervention. Coral partnered with McClatchy to run this experiment on two of their websites to assess whether the feedback prompted commenters to change what they wrote.

Analysis shows that in a six-week experiment on one of the McClatchy websites 36% of those who received the feedback edited their comment to reduce toxicity. That number rose to 40% for a 12-week experiment on another site. In both cases, about 20% of commenters abandoned the comment after seeing the feedback and about 40% ignored the nudge and still submitted a toxic comment.

These encouraging results were supported by another Authorship-Feedback study we conducted with OpenWeb, a leading audience engagement platform that hosts 100 million active users per month. After analyzing 400,000 comments from 50,000 users, we found that 34% of users who received feedback powered by Perspective API chose to edit their comment, and 54% of those revised comments met the community standards and were published.

Cynics might assume that these results don’t necessarily suggest that authorship feedback helps to reduce toxicity, it just helps people write comments that avoid the machine learning models that detect it. But when OpenWeb studied their results more closely and compared comments before and after receiving feedback, it turns out that of all commenters who edited their language 44.7% of commenters replaced or removed offensive words, and 7.6% elected to rewrite their comment entirely.

On a smaller but no less instructive scale, The Southeast Missourian newspaper included authorship feedback when they redesigned the commenting system website in 2018 to include Perspective API. Just by providing minor feedback to the people writing comments, the percentage of submitted comments that were “very likely” to be toxic declined by 96%, and the percentage of all potentially toxic comments declined by 45%. Twenty-four percent of commenters revised their commentary to be less toxic after receiving a message. This number grew to more than 34% of users in the last year with each of those users’ comments declining in toxicity level by an average of 21%.

It is worth noting that across multiple platforms, different durations of experiments, and varied timelines, approximately 35% of commenters do change their behavior for the better. This reproducibility suggests that while providing commenters feedback on their language won’t eliminate all toxicity, early results from platforms that have integrated the feature are encouraging. Providing subtle feedback — even just pointing out to people to reconsider their language — can measurably reduce the toxicity on a platform over time.

Equally important, providing authorship feedback is another way for platforms to harness the power of Perspective API in a different, impactful way. Platforms and publishers around the world are using Perspective to make it easier to host better conversations and moderate discussions more efficiently. We’re excited to build on this success and find even more ways for this technology to help improve conversations online.

Helping Authors Understand Toxicity, One Comment at a Time

Written by Jigsaw