Proactive Policy Assurance of Post Quality for Online Communities

Published in

ACM CSCW

5 min readOct 18, 2018

This post summarizes a research paper, Data-Driven Proactive Policy Assurance of Post Quality in Community Q&A Sites , co-authored with Chunyang Chen, Xi Chen, Jiamou Sun, Zhenchang Xing, and Guoqiang Li. The paper will be presented at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) on Monday November 5th.

Online question and answer (Q&A) sites are platforms for participants to ask and answer questions. Recent years has witnessed the popularity of both the general Q&A sites like Quora and domain-specific Q&A sites like Stack Overflow. As time flows, these Q&A sites have accumulated a large pool of valuable knowledge which serves millions of users around the world. However, the dramatic growth of posts and users on a Q&A site poses a severe challenge to the quality assurance of the site. For example, some users may post questions and answers with misspellings, grammar errors, inappropriate code and text formatting, and the lack of important information.

To improve the readability and value of posts, Stack Overflow has established community norms and encourages users to obey the policies. But the fact is many users violate rules carelessly, especially for those new users, they may not even be aware of the existence of the policies and easily make mistakes on their posts. Quality assurance policies are a kind of tacit knowledge and implicit community norms in a Q&A site, which specifies expected behaviors in the community. Community norms are usually context sensitive, which leads to different rules in different context, such as attaching list markdown to highlight list structure or applying proper indent to code snippet.

Stack Overflow allows users to modify posts created by other users to reduce the occurrence of low-quality posts. We found that 39.6% of posts have been edited at least once as of August 2017. It is very common to see that posts with popular topics have dozens of edits. Some of these edits correct mistakes made by novice or careless users, some aim to clarify posts by adding hyperlinks, images, explanation or instances, etc. Furthermore, a scoring system is built to lead users to focus on those standard answers.

However, all of these works require lots of manual effort and consuming time of post viewers. Some mistakes of posts may be only pointed out in post comments and wait for post creators themselves to correct. Posts with problems may confuse or mislead ordinary users in the early stage. Even the edits themselves may not be correct and meaningful. Some violations of community norms, especially relatively complicated ones such as whether some code or hyperlinks are needed require a good understanding of question and answer content. Thus, a post policy assurance janitor is necessary to both improve post quality for Q&A sites and save time for community users.

The aim of our work is to develop a proactive policy assurance mechanism which can complement the collaborative mechanism. New users will be suggested to revise the potential errors while experienced users will find whether they need to provide more information earlier. If the post owner ignores proactive warnings of potential issues and publishes a post, the policy checker could also inform post viewers of potential problems in posts so that they can either edit the posts directly or seek help from post owners. The questions are mainly two-fold:

1) What kinds of community norms are manifested in collaborative edits?

We conduct an empirical study of historical collaborative editing data on Stack Overflow to investigate the need for proactive quality assurance on Q&A sites. Based on our empirical observation, the post edits in Stack Overflow are summarized into several common types, namely format revision, link and image modification and minor problem correction (spelling and grammar) using document analysis technique. Other edits are related to explanation of domain knowledge or supplementary materials. We focus on predicting code and text format, link and image editing as they account for a large proportion among all posts and are relevant to community norms, which makes edit prediction possible. For example, image in posts must follow the markdown provision, which makes image edits easy to detect among different versions of posts.

2) What techniques can effectively learn collaborative editing patterns by human users to assist users in following community norms proactively?

We use machine learning based technique to achieve the following functionality: edit prediction and locating key phrases in posts to explain the edit. When an original post is given, the mechanism predicts whether the post needs particular kinds of edit. Key phrases detected usually explain the prediction results and imply where need to be edited in the post. For instance, new terms tend to be suggested to have a link with it. Post owners will not only be informed of which types of edits are recommended but also the key phrases that contribute most to the recommendation decision as a hint of editing.

We discuss the potential benefits of our post-edit recommendation approach for post owners, post editors and novice users. However, deploying our approach on Stack Overflow may have complicated impacts on social process and collaborative editing, which deserve further studies in the future.

Paper citation:

Chunyang Chen, Xi Chen, Jiamou Sun, Zhenchang Xing, and Guoqiang Li. 2018. Data-Driven Proactive Policy Assurance of Post Quality in Community Q&A Sites. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 33 (November 2018), 22 pages. https://doi.org/10.1145/3274302

Proactive Policy Assurance of Post Quality for Online Communities

1) What kinds of community norms are manifested in collaborative edits?

2) What techniques can effectively learn collaborative editing patterns by human users to assist users in following community norms proactively?

Written by Xi Chen