Community management: the Wikipedia experience

Published in

Enrique Dans

4 min readFeb 20, 2017

I’ve just read a fascinating study using Wikipedia’s community management data called “Ex Machina: personal attacks seen at scale”, in which a very large set of manually or automatically labeled comments was studied to determine patterns of abuse. Wikipedia is one of the largest online communities in the world, and the proper functioning of the site depends to a large extent on a healthy and balanced community. From the workings of the Wikipedia community and, above all, from their abuse dynamics, some extremely interesting conclusions can be drawn for community managers of all kinds.

The first conclusion seems clear: a small number of highly toxic users are responsible for a high percentage of abuse on the page. This coincides with my experience with Wikipedia, where on several occasions I have been able to witness up close how a commentator went from making some occasional toxic or insulting commentary to a dynamic in which he or she intervened in absolutely all threads, using insulting language, whatever the subject matter. Logically, when everything you say on any subject is received with hostility and insults, the tolerance barrier and freedom of expression go out of the window: such people are simply an obstacle to the proper functioning of the community, a kind of annoying “background noise”, and should be expelled.

In the case of Wikipedia, 34 users rated “highly toxic” were responsible for 9% of personal attacks on the page: by isolating those users and expelling them from the community a major improvement can be made. The remaining 80% of personal attacks were carried out by a group of about 9,000 users who had made less than five insulting comments each, something that may well have to do with: “we all get angry sometimes” and that should not necessarily result in disqualification from threads, but rather should be considered part of the dynamics of participation, within reason.

In my experience, the key is finding the right balance, something similar to what in the primitive BBS we knew as user ratio: although that applied to the amount of bits contributed versus bits downloaded, the same idea in terms of the percentage of comments made by a user that are insulting could be applied, leading to disqualification beyond a certain threshold. Someone who habitually contributes quality comments, but who is sometimes irritated with certain attitudes or issues and resorts to insults is completely different to those whose only motive is to insult others, and who seems to have made it their mission in life, and I have known a few over the course of fourteen years of managing my own page. It is better to expel such users and do everything possible to prevent them from re-entering.

Another obvious problem in community management is anonymity. In the case of Wikipedia, 43% of all comments come from anonymous users, although many of them had commented only once, and the total number of comments was twenty times lower than those contributed by registered users. Anonymous users ended up being six times more active in terms of insulting behavior than registered users, but given the difference in volume, contributed to less than half of such attacks, implying that most of the attacks came from users with a profile registered on the page. Ending anonymity, therefore, is one way to put an end to a number of insulting comments, but being able to insult others with impunity is not the main reason most people opt for anonymity.

What’s more, anybody who just wants to play the troll can open a profile on the site ( In the case of Wikipedia anyone who lacks a profile is considered anonymous anyway, and it also possible to open a profile with pseudonym).

There is a very clear correlation between habitual participation and profile creation, which in my view supports the use of white list mechanisms: regular commentators deserve better treatment than those who simply appear one day and drop a comment.

While the management of the former can be done on an exceptional basis, the latter should be dealt with manually or by partially automated systems, depending on volume.

The study includes a design for a methodology to avoid personal attacks and arguments through crowdsourcing (user alerts through evaluation mechanisms) and machine learning systems that are trained to be able to recognize insults or spam, and that can overridden by moderators if necessary.

This is undoubtedly a promising area of study: although in my case I solved this problem a long time ago by setting up black and white lists, while allowing anonymity because it seemed to me that it did more good than harm. My page works well, mainly because it is relatively small, and also because I review all comments personally as they are posted. From the moment that task is beyond the remit of one person, as indeed happens in most media, the development of methodologies based on crowdsourcing or machine learning can offer solutions to community managers. The possibility of generating communities around a theme is one of the best contributions of the internet.

Getting these communities to work properly, instead of becoming troll grottos is fundamental to the proper functioning of a site or social network, as is shown by the problems experienced by Twitter, which has been unable to control this issue throughout its history. The Wikipedia community study’s approach to the problem is the best I’ve seen: the conclusions are sound, and it is always good to see a serious, quantifiable analysis. Let’s see if it inspires others to do the same…

(En español, aquí)

Community management: the Wikipedia experience

Written by Enrique Dans