There is only one rule on one of my favorite subreddits for cute animals, r/catsinbusinessattire: “post pictures of cats in business attire.” In contrast, r/cats (which has nearly 1 million subscribers) begins with the #1 rule “Cats! Post pictures of your cats, talk about cats, ask questions, get advice” but then goes on to describe additional rules, such as disallowing NSFW content, name-calling, and posting of personal information. Similarly, one of the most popular subreddits r/aww (with almost 20 million subscribers) has rules against harassing comments, asking for upvotes, and bots. Rules can both serve to set specific boundaries about unacceptable behavior (for example, prevalent rules about harassment) and others might make explicit less obvious norms in a community — for example, that r/aww disallows “sad” content but r/cats explicitly allows “mourning” posts.
In recent years we have seen a huge amount of scrutiny leveled on the regulatory structures of social platforms as they deal with issues like harassment, hate speech, and misinformation. My own prior work about the role of copyright in online communities revealed that one of the reasons people have trouble knowing what rules to follow is that rules come from different sources, at different levels, and sometimes conflict. For example, what if the law itself suggests a different action than does the platform’s Terms of Service than does the social norms of the platform’s community? This can be particularly difficult for topics like online harassment, which is poorly defined and inconsistently enforced.
In ongoing work in my research lab, I am very interested in how these different sources of rules work together in an online environment — forming an ecosystem of governance that includes as stakeholders users, communities, platforms, and policymakers. Reddit is a wonderful platform for considering these issues because on the platform itself explicit rules come from multiple formal sources (the official user agreement and content policy, “reddiquette” as articulated by users on the platform as a whole, and rules for individual subreddits typically created by each sub-community’s moderators). The existence of these separate sets of rules for so many small communities bound together by a larger platform creates a great opportunity to consider how rules are similar and different, and how they all work together. I started with an overarching question: What even are the rules across all these subreddits anyway? And we went from there into more complex analyses.
Around the time I was spinning up data collection for this project (which included the help of an amazing CU undergrad, Josh McCann, who began collecting rules from hundreds and hundreds of subreddits by hand), my first PhD student Aaron Jiang joined my lab. Aaron and his co-advisor, my colleague and frequent collaborator Jed Brubaker, are mixed methods ninjas who can do amazing things with big datasets. Suddenly we could think much bigger for the scope of this project: What if instead of the rules from a random sample of 1,000 subreddits, we were able to look at 100,000?
Over the course of about a year, our team conducted a qualitative analysis of over 3,000 subreddit rules, and Aaron led the creation of classifiers for rule types that we applied to the most popular 100,000 subreddits. The results of our analysis, which included types and frequencies of rules across reddit as well as how these interact with the content policy and reddiquette, became a paper titled “Reddit Rules! Characterizing an Ecosystem of Governance” that was published and presented at the AAAI International Conference on Web and Social Media (ICWSM) this past summer. There is a lot of data and findings and nuance in that paper, but here are a few selected findings and big picture thoughts.
First, what kinds of rules are on reddit? It’s always tricky to choose the right level of granularity for these kinds of analyses, but in the end we coded for 24 categories of rules, including topics like “harassment,” “low-quality content,” and “spam.” The table below illustrates these categories and their frequencies across our smaller dataset (hand coded, N = 523 subreddits) and our larger dataset (determined by classifiers, N = 23,752 subreddits that likely had rules out of 100,000).
Note that these classifiers (which performed pretty well, between 59% and 97% accuracy depending on the rules type) were based on the language in the sidebar of each subreddit. We also considered meta-characteristics of subreddits (such as age, number of subscribers, number of moderators, etc.), and whether these might be predictive of the presence or absence of a particular type of rule (or whether there were rules at all). We hypothesized that there might be some patterns here (for example, that a subreddit with more subscribers is more likely to have a rule about harassment). But the answer is… not really. These classifiers performed at 60% accuracy at best, barely better than chance.
We also looked to the relationship between rules, having hypothesized that perhaps some subreddits simply copy rules from each other, particularly smaller or newer subreddits trying to bootstrap. Again, this did not really seem to be the case. There were very few verbatim copies of rules across our datasets. We also found that individual subreddit rules do not frequently reference other sources of rules such as reddiquette or content policy, but of course our qualitative analysis resulting in the categories above shows that the general topics of the rules across the site share similarities. But it appears that the ways they are implemented in these sub-communities are quite individualized.
To sum up some of the big picture takeaways of our analysis:
- We now know what kinds of rules communities are creating on reddit. This suggests the kinds of things that are important for many different kinds of communities to regulate.
- But it turns out that these rules are very difficult to predict based on meta-characteristics you might expect to be important, like size and age.
- Instead, rules seem to be produced in a highly context-specific way. They are not one-size-fits-all, but tailored to specific communities. However, rules still have relationships to each other, to sitewide reddiquette, and to platform policy. It’s an entire ecosystem of rules.
With a lot more scholarly interest in Reddit, we hope that this work is useful in understanding an essential component of that platform’s functioning, particularly given a lot of great recent work around moderation practices on Reddit. Aaron is continuing work in this space, leading research about how rules are formed, how they relate to community norms, and how we might think about algorithmic enforcement of community-created rules.
As a final note, because I like talking about process and I think it’s important to surface some of the sausage-making behind published papers, this one was rejected twice before its acceptance at ICWSM (where, at least, it was quite well received!). Part of the challenge was that the findings from our analyses just weren’t as splashy as they might have been — revealing, for example, some interesting patterns of what kinds of subreddits have particular kinds of rules. Instead, we saw that these patterns don’t really seem to exist, which suggests something potentially important (that rules are context-specific), but in the end results in a paper that is largely descriptive. I think that there is a place for this kind of work, but it can be tricky when the “so what?” is in part “because we should care how this thing works!” But for us at least, this work opened up a slew of new research questions that we’re actively investigating. Because, after all, it’s really important that people know things like when they can post sad cat content, and when they can’t.
I would also like to call out this project as what I think was a wonderful example of mixed methods work, possible only because of my fantastic collaborators. This is something I’d like to see a lot more of as well!
For a lot more information about this, have a look at the published paper!
Fiesler, Casey, Jialun” Aaron” Jiang, Joshua McCann, Kyle Frye, and Jed R. Brubaker. “Reddit Rules! Characterizing an Ecosystem of Governance.” In ICWSM, pp. 72–81. 2018.