By Marie Pellat & Patricia Georgiou
Today Jigsaw is pleased to announce that EL PAÍS, the most-read newspaper in Spanish, is using our AI-powered technology to improve conversations on its site. This represents the first time that Perspective, technology that uses machine learning to spot abuse, is being used to analyze comments in Spanish.
OUR PARTNERSHIP WITH EL PAÍS
Perspective can help human moderators sort comments more effectively by sorting them according to their “toxicity,” but it can help in other ways too — Perspective can show people in real-time whether their comment is likely to be perceived as toxic, sort of like a spell check for toxicity. This is how EL PAÍS is using it: beginning today, commenters will see a little notification when they write their comment if the AI detects that a comment might be perceived as toxic and potentially violate their community guidelines. Perspective gives a “score” (a number between 1–100) that indicates how confident the algorithm is that a comment is similar to toxic comments it’s seen in the past. That lets commenters see how the public might perceive the comment before they publish.
We’re excited about EL PAÍS’s announcement for two reasons. First, it highlights another way to use the information that Perspective provides — a measurement of toxicity in language — in ways other than helping moderators sort comments or letting readers select which comments they see. Studies have shown that when people receive real-time feedback that their comments might be perceived as toxic, commenters often opt to rephrase their comments.
Second, and equally important, is the launch of Perspective in Spanish. Perspective “learns” to understand the nuances of human language by reading millions, even billions of comments and detecting patterns. Just like a human, the more comments the machine sees, the better it understands the language. The process of using large data sets to create machine learning models is called “training.”
The process for training Perspective to work in new languages is identical to training it in English, but training machine learning models requires substantial data sets — in this case, lots of public online comments in Spanish. Beginning last year, Jigsaw partnered with EL PAÍS to analyze millions of past public comments from their website.
Those Spanish-language comments helped us train our machine learning models to understand how to spot toxicity in Spanish, as well as the linguistic nuances of that language. That’s why it’s not always sufficient to have the machine learning models translate comments from one language to another — you have to teach the AI the unique differences in each language so it can be more accurate and precise.
HOW PERSPECTIVE WORKS
In September 2017 we launched Perspective, a new tool that uses machine learning to spot abusive comments online. By analyzing billions of online public comments from across the internet, our artificial intelligence algorithms have learned to identify complex nuances of language. They’ve become especially good at detecting “toxicity,” or measuring the likelihood of a comment to make someone leave a conversation.
Publishers and platforms around the world are already using Perspective to host better conversations. The New York Times uses Perspective to help its human moderators sort comments more quickly, allowing them to increase the number of articles where they offer comments by more than 300 percent. That means more people can engage on their platform with news that matters to them.
Other publishers use Perspective’s technology in different ways. In fact, three of the top six commenting platforms in the world are using Perspective to help increase engagement and enlarge the public space where people can discuss important issues.
LEARNING FROM MISTAKES
Even though our machine learning models have improved tremendously since we first launched — becoming more accurate, more subtle, detecting nuanced complexities in human language — machine learning models still make mistakes. Sometimes the models return “false positives,” or suggest that a comment might be toxic when it obviously isn’t. And of course they might miss some comments that are clearly toxic. Under the careful supervision of Jigsaw’s engineers and research scientists, our ML models learn from their mistakes. Each error and correction helps our models get smarter.
Another source of mistakes is what’s called “unintended bias.” Identity terms for more frequently targeted groups (e.g. words like “black”, “muslim”, “feminist”, “woman”, “gay” etc) often have higher toxicity scores because comments about those groups are over-represented in abusive and toxic comments. Unfortunately, that means that the data we used to train Perspective exhibits that same trend: the names of targeted groups appear far more often in abusive comments. We are constantly working to find and mitigate these sources of bias, and we still have work to do. Check out our blog, The False Positive, to learn more about some of our research methods to reduce unintended bias.
WHAT’S NEXT FOR PERSPECTIVE
We’re excited for the milestone of launching Perspective in Spanish, and we’re working to launch it in additional languages over the next year. In the next few months, we’ll make Perspective’s Spanish-language machine learning models available for developers to use and experiment with, so they can continue to find novel ways to use the technology to improve conversations at scale.
As publishers and platforms struggle to moderate civil discussions at scale, Perspective gives them the power of AI to help open more articles for comments, give readers choices about the comments they want to see, and keep comment sections free of abuse and harassment.
Because that’s what Perspective is all about — harnessing the power of machine learning to enlarge and improve the internet’s global public square, the spaces where people come to exchange ideas and opinions. Our partnership with EL PAÍS is a perfect example of how Perspective puts that power into the hands of publishers to host better conversations.
Marie Pellat is a software engineer at Jigsaw, and Patricia Georgiou is Jigsaw’s head of partnerships and business development.