We created Perspective for publishers and developers to explore how machine learning can help them improve the participation, quality, and empathy of online conversations. One of our first steps was to release an Alpha version of a toxicity model to help communities identify comments that may make people leave a conversation. But like all of us, machine learning makes mistakes.
Perspective’s accuracy is limited both by the specific comments it’s learned from and by the structure of the underlying models. When we have millions of examples of comments to train a model, we can at least start on the problem, but even models that have seen tens of millions of comments have a long way to go. In fact, we email every new Perspective API user reminding them that responses will contain errors, and shouldn’t be used to remove or block comments without human review. Imperfect models will misclassify as toxic a certain fraction of innocent comments and miss some forms of personal attacks.
We improve models by working with communities to audit their output, report examples, and retrain the models on the corrections. In future False Positive posts we’ll explore individual errors and mitigation strategies in detail, and share tools that can help others improve similar initiatives. But what about the imperfect models that already exist?
Can imperfect models be useful to conversations?
The key to leveraging early-stage machine learning is not to use it as a standalone solution, but as an assistant that can help people work more efficiently in their efforts to expand and improve community discussions.
A use case: Wikipedia discussions. Let’s say you’re a volunteer working to find violations of Wikipedia’s civility policy. This is a challenging and important problem for the Wikipedia community: a harassment survey conducted in 2015 found that 38% of Wikipedia editors surveyed have personally experienced harassment. Of those who’ve witnessed harassment, 44% reduced their involvement on Wikipedia as a result. The challenge is that in a sea of 300,000 “talk page” contributions added every month (these are the forums where editors discuss edits), fewer than 1% are toxic. To find and respond to the toxic comments, you’re essentially looking for needles in a haystack. How long do you think it would take to find the toxic comments amidst all that hay?
Try manually reviewing for toxic language without ML
This list contains the 11,365 comments that were made on September 4th, 2017 sorted chronologically (technically these are edits to Wikipedia’s User Talk and Article Talk pages). Please be warned, this link contains very offensive language, and it can take a little while to load all those comments. Buried in this spreadsheet are some comments that might be driving people out of the community. See how long it takes to find 10 rows that are toxic.
Still looking? You may be more patient than most; in a joint study we conducted with the Wikimedia Foundation, we estimate that only 17.9% of personal attacks prompt a blocking or warning. This hints at the fundamental challenge that every community manager faces — how to facilitate a conversation at scale.
Now try reviewing with the help of ML
Here is that same list, ordered by Perspective API’s Toxicity score (Again, please note that this spreadsheet contains very offensive language, and will take a while to load). Try again and see how long it takes to find 10 toxic rows.
Machine learning can make it much easier to focus attention on the comments that are most similar to ones previously tagged as toxic. Perspective’s models are far from perfect, but, as this quick demo shows, what they can do well is to help people reduce massive haystacks into smaller handfuls of hay.