Lucas, I mean this with a lot of deep respect for you and your work, specifically, because of what you and Jigsaw are doing with understanding harassment, and specifically, studying comment sections. It’s incredibly necessary and much needed work.
BUT, as a product page, and a product roll out, there wasn’t enough language or warnings, rather, about how in beta this was. I heard and saw a lot of people testing Perspective with specific kinds of use cases- specifically testing Arabic language and words related to Islam, and those words were rated a certain amount of toxic, with no clear understanding why.
When I was at Watson, I noticed if we created any kind of public demo using any NLP or machine learning APIs, and the demo was wrapped in something like a search bar or a chat bot interface or a text input UI, bc it looked like a product users had seen before, the assumption was it was a product that had been QA-ed. UX design within machine learning is still super new, and the example I give above is important- because the page didn’t say enough (in big bold letters, new font, everywhere and repeated) and bc it was a stand alone page, it looked and felt like a product roll out, regardless of it’s not. And if you’re representing the API as such, people will test it as such. You could’ve given examples specifically on this Writing Experiments page on the kinds of comments you are looking at. I do think, for this kind of demo, certain words could’ve been black listed or instead of being called toxic, unsure or highlighted- there’s not enough data. 34% is a really low rating in machine learning but do users know that?
This is where it’s on the company to explain the results or perhaps show- this is how we are analyzing the word instead of just giving it a rating.
How can users help make the rating of “arabs” less toxic if they don’t know how you are rating and you aren’t showing that in the interface?
